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Abstract 

Typestate systems ensure many desirable properties of imperative 
programs, including initialization of object fields and correct use of 
stateful library interfaces. Abstract sets with cardinality constraints 
naturally generalize typestate properties: relationships between the 
typestates of objects can be expressed as subset and disjointness 
relations on sets, and elements of sets can be represented as sets 
of cardinality one. In addition, sets with cardinality constraints 
provide a natural language for specifying operations and invariants 
of data structures. 

Motivated by these program analysis applications, this paper 
presents new algorithms and new complexity results for constraints 
on sets and their cardinalities. We study several classes of con- 
straints and demonstrate a trade-off between their expressive power 
and their complexity. 

Our first result concerns a quantifier-free fragment of Boolean 
Algebra with Presburger Arithmetic. We give a nondeterministic 
polynomial-time algorithm for reducing the satisfiability of sets 
with symbolic cardinalities to constraints on constant cardinalities, 
and give a polynomial-space algorithm for the resulting problem. 
The best previously existing algorithm runs in exponential space 
and nondeterministic exponential time. 

In a quest for more efficient fragments, we identify several 
subclasses of sets with cardinality constraints whose satisfiabil- 
ity is NP-hard. Finally, we identify a class of constraints that has 
polynomial-time satisfiability and entailment problems and can 
serve as a foundation for efficient program analysis. We give a sys- 
tem of rewriting rules for enforcing certain consistency properties 
of these constraints and show how to extract complete information 
from constraints in normal form. This result implies the soundness 
and completeness of our algorithms. 



1. Introduction 

Program analyses that reason about deep semantic properties are of 
great value for software development; the value of such analyses 
is growing with the adoption of language constructs that eliminate 
low-level program errors. Many deep semantic properties are natu- 
rally expressible in fragments of set theory, so constraint solving for 
such fragments is of interest. This paper presents new algorithms 
and improved complexity bounds for fragments of set theory. The 
starting point of our constraints is the boolean algebra of finite (but 
unbounded) sets. 

Sets in program analysis. The boolean algebra of finite sets 
is a fragment of set theory that allows the basic set operations 
of intersection, union, and complement on sets of uninterpreted 
elements. Although simple, it turns out that this fragment can 



express many properties of interest in program analysis. Examples 
include typestate properties and public interfaces of data structures. 

Set specifications generalize typestate properties |29 26 1: the 
fact that an object o is in the typestate t is represented as the set 
membership of o in t. Through inclusion and disjointness con- 
straints, sets can also express relationships (such as hierarchy or 
orthogonality) between different typestates. Objects can be rep- 
resented as sets of cardinality one using a cardinality constraint 
|o| = 1, so set membership reduces to subset. Multiple set member- 
ships can then encode constraints such as \t\ > k for any constant 
k. 

Sets can also provide natural abstractions of container data 
structures. When a content of a data structure is represented as an 
abstract set s, an operation such as insertion can be characterized 
by a postcondition s' — s U e where e is the set corresponding 
to the element being inserted. By expressing both typestates and 
data structure abstractions, sets can be used to combine the results 
of different analyses operating on the same program. Such an 
approach allows us to combine the scalability of typestate analysis 
with th e precision of shape analysis and theorem proving Ii 30ll28l 

Sets with cardinality constraints. The use of the cardinality op- 
erator on sets leads to a connection between set algebra opera- 
tions and integer linear arithmetic, as evidenced, for example, in 
the condition \a U b\ = \a\ + \b\ for disjoint sets a and b. It is 
therefore natural to consider constraints that combine integer linear 
arithmetic with set algebra operations. These constraints constitute 
the Quantifier-Free Boolean Algebra with Presburger Arithmetic, 
or QFBAPA for short — they are the quantifier-free fragment of 
BAPA constraints whose decision procedure and complexity we 
have studied in |23 22 1. QFBAPA constraints can be used to ver- 
ify an invariant such as \a\ = \b\ which allows us to conclude that 
if a is nonempty, so is b, and therefore it is possible to call an op- 
eration that removes an element from b. Similarly, if i is an integer 
variable and s is a set, it is possible to verify an invariant \s\ = i 
stating that an integer i correctly maintains the size of the set s. 
In our experience, specialized decision procedures such as 1 22 1 are 
the only automated technique for deciding with non-trivial cardi- 
nality constraints. Currently, however, the complexity of these de- 
cision procedures limits their applicability. In this paper we give 
new algorithms for solving set cardinality constraints; these algo- 
rithms provide exponential improvements over existing approaches 
and make the checking of cardinality constraints in larger formulas 
more feasible. 

Our paper provides a systematic study of constraints on sets in 
the presence of cardinalities. We study both more expressive and 
less expressive fragments and demonstrate a trade-off between the 
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F ::= A | Fx A F 2 \ Fy V F 2 \ ->F 

A ::= Bi = B 2 \ B x C B 2 | Tj = T 2 | Ti < T 2 | K dvd T 

S ::= s | | 1 | Bi UB 2 | Bi nB 2 | B c 

T ::= i | K \ Ti + T 2 ] K-T | IB I 

# ::= ... | -2 | -1 | | 1 | 2 | .. . 

Figure 1. Quantifier-Free Formulas of Boolean Algebra with Pres- 
burger Arithmetic (QFBAPA) 



expressive power and the efficiency of the algorithms. The main 
contributions of our paper are the following: 

• PSPACE algorithm for QFBAPA. The best previously 
known algorithms for QFBAPA |23l|22]|45) execute in non- 
deterministic exponential time, and involve searching for an ex- 
ponentially large object. In this paper we first give a form of 
bounded model property that shows that it is possible to replace 
reasoning about symbolic cardinalities such as |a| = iA\b\ = i 
where i is an integer variable, with guessing sufficiently large 
constant cardinalities, such as \a\ = 1000 A |6| = 1000. More- 
over, we give a space-efficient algorithm for solving the result- 
ing constraints on sets with large constant cardinalities. This 
gives a PSPACE decision procedure for QFBAPA and is the 
first contribution of this paper. 

• A Polynomial-Time Class. Given that QFBAPA constraints 
are NP-hard, the question remains whether there are interest- 
ing fragments of sets with cardinalities which can be reasoned 
about in polynomial time. In a quest for such fragments, we 
identify several features of constraints, each of which leads to 
NP-hardness. By eliminating these features we have discovered 
a class (called i-trees) that has a polynomial-time satisfiability 
and entailment (subsumption) problems, while still supporting 
subset, union, disjointness, and arbitrarily large cardinality con- 
straints. This class can therefore express generalized typestate 
constraints such as multiple orthogonal classifications into inde- 
pendent or disjoint sets. The identification of this polynomial- 
time class, and the development of algorithms for testing the 
satisfiability and subsumption of constraints in this class is the 
second contribution of this paper. While the resulting algo- 
rithms are efficient, the proof of their completeness is some- 
what lengthy, and involves characterizations of normal forms 
of i-trees and the construction of models for i-trees in normal 
form. We therefore only summarize the main ideas; we refer 
the reader to the full version of the paper 1 32 1 for details. Addi- 
tional proofs are also included in the Appendix. 

We proceed by defining the fragment QFBAPA in Section|2| We 
present a PSPACE algorithm for QFBAPA in Section^ defining 
the simpler CBAC constraints and identifying their NP-complete 
fragment, CBASC constraints. 

2. Constraints on Sets with Cardinalities 

Boolean Algebra with Presburger Arithmetic. FigureQpresents 
the syntax of the constraints studied in this paper, we call these for- 
mulas Quantifier-Free Boolean Algebra with Presburger Arithmetic 
(QFBAPA). QFBAPA constraints contain two kinds of values: in- 
tegers and sets, each with corresponding applicable operations. The 
sets are interpreted as subsets of some arbitrarily large finite set. s 
denotes a set variable, i denotes an integer variable. The symbol 
|i? | denotes the cardinality of the set B and establishes the connec- 
tion between set and integer terms. MAXC is a special free variable 
denoting the size of the universal set. If b is a set, b c denotes its 
complement. K dvd T denotes that K divides T. K denotes con- 
stants, encoded in binary: a constant k is encoded using 0(log k) 



bits. The symbol A in FigureQdenotes atomic formulas; a literal is 
an atomic formula or its negation. 

A q uantified version of this language (BAPA) is studied in 
I— 31 1221 : where we give an algorithm that establishes a doubly ex- 
ponential space upper bound on the complexity. Because quanti- 
fied BAPA subsumes Presburger arithmetic, the doubly exponential 
nondeterministic time lower bound 1 15 1 applies to BAPA as well. 
Preliminaries. If S is a finite set, \S\ denotes the number of 
elements in S. A literal is an atomic formula or its negation. Z = 
{. . . , —1, 0, 1, . . .} is the set of integers, N = {0, 1, . . .} is the 
set of natural numbers. [a..b] denotes the set of integers {a, a + 
1, . . . , b}. If / : A — > B is a function and S C A, we define 
f[S] = {/(a) | a G S}. 

If A is a set, the notation A y has several potential meanings; 
the specific meaning should be clear from the context. A n for 
n G {1, 2, . . . , } is the set of vectors (oi, . . . , On) where cy G A 
for 1 < J < n, and A m ' n is the set of matrices [a pq ] with m 
rows and n columns with elements a pq for 1 < p < m and 
1 < q < n. The expression A c denotes the complement of the 
set A. If a G {0, 1}, then A a denotes A for a = 1 and A c for 
a = 0. 

The relation = denotes the equality of the values of metavari- 
ables denoting syntactic objects, so if /i and f 2 are formulas, then 
/i = f 2 means that they are the same formula. In the context of 
inclusion diagrams (SectionfU, = will denote the semantic equiva- 
lence of diagrams (we use = to denote the equality of diagrams). 

3. A PSPACE Algorithm for QFBAPA 

Verification conditions arising in program verification can often 
be expressed using quantifier-free formulas, so it is natural to ex- 
amine whether more efficient algorithms exist for QFBAPA con- 
straints. When applied to QFBAPA formulas, existing algorithms 
run in non-deterministic exponential time (NEXPTIME): the al- 
gorithm 1 45 1 requires nondeterministically guessing an exponen- 
tially large object, whereas the algorithm a from 1 22 1 produces an 
exponentially large quantifier-free Presburger arithmetic formula. 
The question arises whether there exist algorithms that avoid non- 
deterministically guessing exponentially large objects. We show 
that this is indeed the case. Namely, we first show that Presburger 
arithmetic formulas generated by the algorithm a from 1 22 1 can in 
fact be solved in deterministic exponential time. Our result reduces 
QFBAPA to a simpler system of CBAC constraints (shown in Fig- 
ure then applies a theorem by Papadimitriou |36j in a novel 
way. This leads to a deterministic EXPTIME decision procedure 
for QFBAPA satisfiability, which is an improvement on previously 
existing algorithms. Nevertheless, the question arises whether it is 
possible to avoid the construction of a non-deterministically large 
system of equations. It turns out that this is indeed possible: we 
present an alternating polynomial-time (and therefore, PSPACE) 
algorithm for QFBAPA. Therefore, it is possible to so lve QFBAPA 
using solvers for quantified boolean formulas 1 9 48 37 1. 

Figures |2l and El present our PSPACE algorithm for QFBAPA. 
The algorithm has two phases. 

In the first phase, the non-deterministic polynomial-time algo- 
rithm in Figure |2|reduces QFBAPA constraints to a simpler class 
of constraints. We call these simpler constraints Conjunctions of 
Boolean Algebra expressions with Cardinalities (CBAC). CBAC 
constraints have a very simple syntactic structure (see Figure |3j, 
but capture the key difficulty in solving QFBAPA: the need to con- 
sider exponentially large cardinalities on exponentially many set 
partitions. 

In the second phase, the algorithm in Figure fH checks the satis- 
fiability of CBAC in alternating polynomial time and therefore in 
polynomial space. The key insight behind our algorithm is that it 
is possible to use a divide and conquer approach to avoid explicitly 
representing all possible regions in the Venn diagram. 
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Let / be the input QFBAPA formula. 

1. Replace each Z- variable with a difference of two N- 
variables: 

C[ii, . . . ,i„] — > C[ii — ii , . . . , i n — i n ] 
i\ , i'l , . . . , i' n , i" are fresh N- variables 

2. Ensure that all set algebra expressions appear within 
cardinality constraints by normalizing with the following 
rules: 

C[6i = b 2 ]^C[b 1 C6 2 A6 2 C 61] 
C[6i C b 2 ]^C\\b 1 r\b%\ = 0] 

3. Eliminate divisibility constraints: 

C[k dvd t] — > C[ki =t], i is fresh N-variable 

4. Move all cardinality constraints to top level: 
C[\h\,...,\b w \]^ fiAh 

where f t = C[ii, . . . ,i w ] 

def w 

h = |1| = MAXC A a N=* 

3=1 

and ii, . . . , i w are fresh N- variables; let mi = ii> + 1; 

5. Let p be a propositional formula such that 
p(ai, ■ ■ ■ , a mo ) = /1 for atomic formulas ai , . . . , a mo . 
Nondeterministically select the truth value otj S {0, 1} 
for each atomic formula aj, so that p(ai, . . . , a ma ) is 

, . def m a 

true. Let/11 = A a j ■ 

3=1 

6. For each conjunct — i(ti=ta) in /n, non- 
deterministically replace the conjunct with one of 
the conjuncts (ti + 1 < £2) or (£2 + 1 < tl)- 

7. Transform linear integer constraints to normal form: 

CK*i <t 2 )]->C[*2 + l <ti] 
C[ii<t 2 ] ->C[ti -t 2 +i = 0] 
C[*i=fa] -»CE" = iCj*j=fc] 

8. Let 710 be the number of N-variables in the entire for- 
mula. The resulting system is of the form: 

Av = d A Api \bj\= i Pj 

where A G Z m °' n °, d £ Z m °, and w = (h,. . . ,i no ) 
where each ij is an N-variable and 1 < pi, . . . ,p mi < 
mi are variables denoting cardinalities of sets. Let S 
be the total number of set variables in 61, ... , b mi . Let 

m = mo + mi, n = max(no, 2 s ), 

a = max({l} U {|a P9 | | 1 < p < no, 1 < q < mo} 
U{|d 9 | I 1 < q< mo}) " 

where A = and d — (di, . . . , d mo ), and let 

M = n(ma) 2m+1 . 

9. Non-deterministically select a vector k — (ki, . . . , fc„ ) 
where kj S {0, 1, . . . , M} for 1 < j < no, such that 
Ak = d. 

mi 

10. Call CBAC decision procedure on /\ \bA = k p , . If 

3=1 

there exists a solution, then report the formula satisfiable. 



Figure 2. An NP Algorithm for Reducing QFBAPA Constraints 
to CBAC constraints of Figure|3] 



F ::= \B\=K \F\hF2 

B ::= s I | 1 | B 1 U B 2 | Bi n B 2 | B c 

K ::= I 1 I 2 I . . . 



Figure 3. Conjunctions of Boolean Algebra expressions with Car- 
dinalities (CBAC) 

Given a CBAC constraint 

mi 
3 = 1 

where the free set variables of 61, ... , b mi are among si, ■ ■ ■ , S5, 
run CBAC-check([], d) with d — (fci, . . . , k mi ). 

proc CBAC-check([?Ji, . . . , v n ], d) returns result 
where Vi,..., v n , result G {0, 1}; d £ N mi 
if (n < S) then 

existentially choose do, di G N mi such that do + di = d; 
universally do 

ri = CBAC-check([?Ji, . . . , «„, 0], do) and 
r 2 = CBAC-check([?Ji, . . . ,v n , 1], di); 
return n A r 2 ; 
else 

let =eval(6j, [si h-> «i, . . . , ss i-» vs]) 
for all (1 < j < mi); 
Jo = {d J I pj = 0}; 
Ji = {dj I = 1}; 

return J C {0} A |Ji| < 1. 

proc eval(6, a) returns result 

where 6 : Boolean Algebra formula 
a : {si, ...,s s }-» {0, 1} 
result G {0, 1} 
treating 6 as a propositional formula, 
return the value of b under assignment a. 



Figure 4. An Alternating Polynomial-Time (and PSPACE) Algo- 
rithm for Checking the Satisfiability of CBAC Constraints 

We next discuss our algorithm in more detail and argue that it is 
correct. We begin with the description of the steps of the algorithm 
in Figure|2| which reduces symbolic cardinalities to large constant 
cardinalities. 

1. Non-negative integers. To simplify the later steps, the first step 
makes all integer variables range over non-negative integers N, by 
replacing each integer variable i with a difference ii — i 2 of fresh 
non-negative integer variables ii,i 2 . 

2,3. Eliminating set equality and subset, and integer divisibility. 

The next step converts set equality and set subset into cardinality 
constraints. This step helps the later separation between the boolean 
algebra part and the integer linear arithmetic part. We then elimi- 
nate any divisibility relations using multiplication and a fresh vari- 
able. 

4. Flattening. The next step separates the formula into the 
boolean algebra part, denoted /1 and the integer linear arithmetic 
part, denoted f 2 . This step simply amounts to naming the cardinal- 
ity of each set by a fresh integer variable. 

5,6. From quantifier-free formulas to conjunctions. An obvious 
source of NP-completeness of QFBAPA is the presence of arbitrary 
propositional combinations of atomic formulas. An effective way 
of dealing with propositional combinations is to enumerate the 
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satisfying assignments of the prepositional formula using a SAT 
solver, and then solve the conjunctions of literals 1 16 17]. Steps 5 
and 6 of the non-deterministic algorithm in Figure|2|are an abstract 
description of such procedure. The goal of step 6 is to eliminate 
disequalities, which involve non-deterministic choice between the 
two inequalities. 

7. Normal form for integer constraints. The algorithm elimi- 
nates the remaining negations of atomic formulas and transforms 
linear constraints into normal form Av — d. 

8,9,10. Estimating sizes of integer variables. The resulting sys- 
tem contains linear integer equations of the form ^™ =1 Cjij = k, 
and set cardinality constraints of the form |&| = i. The algorithm 
computes an upper bound M on integer variables in any poten- 
tial solution of the system, using several parameters: the number 
of conjuncts n, the number of integer variables no and the number 
of set variables S. The computation of the upper bound is based 
on an observation that the satisfiability of the conjunction of con- 
straints |6| = i can be reduced to the satisfiability of equations 
of the form Ylj=i lj = i, where variables lj denote sizes of set 
partitions (regions in Venn diagram) whose union is the set b; this 
is a specialization of the idea in |22| to the case of quantifier-free 
formulas. 

Let si, . . . , ss be all set variables appearing in formula and 
consider a constraint |fo| = i. Consider all partitions 
for otj £ {0, 1}. For each such partition b p , introduce a fresh In- 
variable l p , which denotes the cardinality of cube b p . Then consider 
a constraint of the form |6| = i. Each set is a union of regions in the 
Venn diagram (by the disjunctive normal form theorem) so suppose 
that b — b pi U . . . U b Pa . Then replace the term \b\ — i with the 
Sg=i lp q = We use the term "CBAC linear equations" to denote 
a system of linear equations resulting from the constraints \b\ = i 
as described above. 

As a result, we obtain a system of mo + mi linear equations 
over non-negative integers, where mo equations have a polynomial 
number of variables, and mi equations (CBAC linear equations) 
have exponentially many variables. It is easy to see that there exists 
a surjective mapping of solutions of the original constraints on 
sets onto solutions of the resulting linear equations (the mapping 
computes the cardinality of each Venn diagram). Therefore, the 
original system is satisfiable if and only if the resulting equations 
are satisfiable. Moreover, we have the following fact. 

FACT 1 (Papadimitriou |36|). Let A be an m x n integer matrix 
and b an m-vector, both with entries from [—a. .a}. Then the system 
Ax = b has a solution in N m ;/ and only if it has a solution in 
[O..M] m where M = n(ma) 2m+1 . 

FactQimplies that the estimate M computed in step 8 of the algo- 
rithm in Figure|2|is a correct upper bound. Using this estimate, step 
9 of the algorithm non-deterministically guesses the values of all 
integer variables such that the original linear equations Ax = d are 
satisfied. All this computation can be performed in nondeterminis- 
tic polynomial time, and (unlike |22|), does not involve construct- 
ing explicitly a system with exponentially many equations. Having 
picked the values of integer variables, including the variables i on 
the right hand side of constraints \b\ = i, we obtain a conjunction 
of constraints of the form |6| = k where k is a constant whose 
binary representation has polynomially many bits — these are pre- 
cisely the CBAC constraints in Figure[3| We have therefore shown 
the following. 

LEMMA 1 . The algorithm in FieureU\reduces in non-deterministic 
polynomial time the satisfiability of a QFBAPA formula to the 
satisfiability of CBAC formulas. 

It remains to find an algorithm for CBAC constraints. 



A PSPACE algorithm for CBAC. One correct way to solve 
CBAC constraints is to solve the associated CBAC linear equa- 
tions. This system has exponentially many variables, each of which 
can take any value from [0..M]. Therefore, guessing the values of 
each of these variables can be done in non-deterministic exponen- 
tial time; similar approaches not based on equations also require 
guessing exponentially large objects |45 |. Note, however, that there 
are only polynomially many CBAC linear equations. Using the idea 
of the proof 1 36 Corollary 1], we can therefore show that a dynamic 
programming algorithm can be used to solve the system in polyno- 
mial time. In fact, we can use the dynamic programming algorithm 
from the proof of 1 36 Corollary 1]. Instead of fixing the size of the 
equations mi to be constant, we simply observe that mi is poly- 
nomial in the size of the input, whereas the number of variables 
is singly exponential. The bound M therefore yields a singly ex- 
ponential deterministic time dynamic programming algorithm for 
CBAC. While this is better than existing results, we show that an 
even better result is achievable. 

Clearly, any algorithm that explicitly constructs CBAC equa- 
tions will require at least exponential time and space. Our solution 
is therefore to adapt the dynamic programming algorithm to a di- 
vide and conquer approach that always represents the equations in 
terms of their original, polynomially sized, boolean algebra expres- 
sion. Such an algorithm runs in alternating polynomial time, con- 
suming polynomial space, and is presented in Figure|4] To see the 
idea of our PSPACE algorithm, consider the CBAC linear system 
of equations written in the vector form: z2j=i a ih = d where d, 
a,j are vectors and lj are the variables for 1 < j ' < 2 P . The algo- 
rithm guesses the vectors do, d\ 6 N™ such that do + d± — d, and 
recursively solves two equations: 

2 p_i_ 1 2 p 

ajlj = do A ajlj = di 

3=1 j=2P~ 1 

This algorithm creates an OR-AND tree whose search gives the 
answer to the original problem. A position in the tree is given by the 
propositional assignment [vi , . . . , v n ] to boolean variables. Each 
leaf in the tree is given by a complete assignment [vi, . . . ,Vs] 
to set variables. Note that we never need to explicitly maintain 
the system during the divide phase of the algorithm, it suffices to 
determine in the leaf case p — whether the coefficient cij is 
or 1. The algorithm does this by simply evaluating each Boolean 
algebra expression b for the assignment [v 1 , . . . , Vs]. 

THEOREM 1 . The algorithm in Figure^checks the satisfiability of 
CBAC constraints in PSPACE. The algorithm given by Figures 121 
and^checks the satisfiability q/'QFBAPA constraints in PSPACE. 

Theorem Q improves the existing algorithms for QFBAPA from 
both a complexity theoretic and an implementation viewpoint. A 
deterministic realization of previous NEXPTIME algorithms runs 
in doubly exponential worst-case time and requires exponential 
space; a deterministic realization of our new algorithm runs in 
singly exponential time and consumes polynomial space. Previous 
algorithms would require running a constraint solver such as a SAT 
solver 1 47 1 on an exponentially large constraint; the new algorithm 
can be solved by running a quantified boolean algebra solver 1 48 
on a polynomially large constraint. 

NP fragments of CBAC We have seen that both CBAC and 
QFBAPA constraints are in PSPACE. Both of these classes of con- 
straints are NP-hard, because the constraint |6| = 1 is satisfiable iff 
b is corresponds to a satisfiable propositional formula. Moreover, 
LemmaQshows that QFBAPA constraints are in NP iff CBAC con- 
straints are in NP. For some subclasses of CBAC constraints we can 
indeed show membership in NP. Define conjunctions of boolean al- 
gebra expressions with small cardinalities, denoted CBASC, to be 
the same as CBAC but with constant integers encoded in unary 
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notation, where an integer x is represented in space O(x) as op- 
posed to OQogx); such encoding can therefore be exponentially 
less compact. 

LEMMA 2. The satisfiability of CBASC constraints is NP- 
complete. 

CBASC solutions are NP-hard because |6| = 1 is a CBASC con- 
straint. One way to prove membership in NP is to observe that 
CBASC is subsumed b y the la nguage of set- valued fields which 
was proven to be in NP [24 25 1 by reduction to the universal class 
of first-order logic formulas, which has the small model property 
|7| Page 258]. Another way is to consider the notion of sparse solu- 
tions of CBAC linear equations. An M -sparse solution is a solution 
to CBAC linear constraints with at most M non-zero elements. An 
Af -sparse solution to CBAC linear constraints with 2 s variables 
can be encoded as an Af -tuple of pairs ([vi, . . . , Vs], k) where the 
propositional assignment [vi, . . . ,Vs]', encodes one of the 2 s in- 
teger variables, and k specifies the value of that integer variable. 
This encoding is polynomial in MSw where w is the number of 
bits for representing the largest component of the solution. For any 
CBAC linear constraint AJLi \bj I = kj, each solution is Af-sparse 
where Af — max(fci, . . . , k m ). For CBASC solutions, Af is poly- 
nomial in the size of the CBASC representation because each ki is 
encoded in unary, so sparse solutions can be guessed in polynomial 
time. This proves that CBASC constraints are in NP. 1 

4. Inclusion Diagrams 

This section introduces inclusion diagrams (i-diagrams), a graph 
representation of CBAC constraints. Figure|5|shows a formula with 
sets and cardinalities and an equivalent i-diagram. I-diagrams allow 
us to naturally describe fragments of CBAC constraints and the al- 
gorithms for checking satisfiability and subsumption of these frag- 
ments. The basic idea of i-diagrams is to represent the subset partial 
order using a graph where sets are annotated with cardinalities, and 
then indicate the disjointness and union relations by constraints on 
direct subsets of a set. To efficiently represent equal sets, the nodes 
in the i-diagram stand not for set names, but for collections of set 
names that are guaranteed to be equal. Finally, we associate un- 
interpreted predicates with collections of nodes, representing the 
fact that elements of given sets satisfy the properties given by the 
predicate. The uninterpreted predicates illustrate a way to combine 
i-diagram representations with other constraints. 

DEFINITION 1 (i-diagrams). We fix a finite set SN of Set-Names, 
and a finite set PN of predicate names. We denote by the set 
of atoms {+P, -P\P G PN}. 

An i-diagram (Inclusion-Diagram) is either the null-diagram _L<j or 
a tuple (§, 0d, Sons, Split, Comp, Clnf, CSup, $) such that: 

• § C "P(SN) is a partition o/SN containing (nonempty) equiva- 
lence classes of set names that are guaranteed to be equal, with 
0d G § the equivalence class corresponding to names of sets 
whose interpretation is the empty set 0; 

• Sons : § — > V(S) represents subset relation; 

we define S ~» S' <=> S G Sons(5'); then (§, ~-») is a 
graph, so we call elements of S nodes, and the elements of 
edges; we write for the transitive closure o/~>; 

• Split, Comp : § —* V(P(S)) represent disjointness and com- 
pleteness of set inclusions; if S is a node, then Split (S) is a set 
of split views, where each view is a nonempty set of sons that 




V is such that Clnf({si}) = 1 , CSup({si}) = 5, Sons({si}) = 
{{ss, se}, {s 4 }, {S3}}, Comp({si}) = {{{s 5 , s 6 }, {s 4 }, {s 3 }}} 

S P lit({*l}) = {{{s5,S 6 }},{{s 4 },{s 3 }}}, *({*}) = {+P} 

and is equivalent to 



S 2 = A SB = s 6 A 

52 U S3 U S 4 U S5 C Si A s 

53 PI s 4 = A si C s 3 U s 4 U s 5 A s 4 C s 
Is 



Si C S5 A S3 C S2 A S5CS4 
1 . — v , » "j. - ~o " s 4 U s 5 A s 4 C s 5 
1 < |si| < 5 A |s 4 | = 1 A |s 5 | < 3A I S3 1 < 2 A 
G si. P(x) A Vie s 3 . -iQ(ar) 

Figure 5. An example i-diagram V and an equivalent formula 



represent pairwise disjoint sets, and Comp(S') is a set of com- 
plete views each of which is a set of nodes that represent sets 
whose union is equal to the father; we require 

USplit(S) = Sons(S) 
UComp(S') C Sons(S') 

for all S G §; 

• Clnf, CSup : § — > N specify lower and upper bounds on the 
cardinality of sets; 

• <& : § —> 7 3 (PN ± ) maps nodes to the uninterpreted unary 
predicates and their negations that are true for all sets of a 
node. 



' Sparse solutions are interesting for general CBAC constraints as well. As 
of yet we have no example of a CBAC constraint whose associated CBAC 
equation system is satisfiable but has no sparse solutions; moreover, we can 
generalize the notion of sparse solutions to solutions representable using 
binary decision diagrams [ 8 ] while preserving polynomial-time verifiability. 



To avoid confusion between set names, nodes (sets of set names), 
and views (sets of nodes), we use lowercase letters s,Si,s' to 
denote set names, uppercase letters S,Si,S' to denote nodes, 
and letters Q,C to denote views and sets of nodes in gen- 
eral. When T> j^-Ld is a diagram, unless otherwise stated, we 
name its components §, 0d, Sons, Split, Comp, Clnf , CSup, <J>, 
and similarly we name the components of 23' as 

0d, Sons', Split', Comp', Clnf, CSup', $'. 

In a graphical representation of an i-diagram, we represent 
each element S G § where S = {si, . . . ,s n } using underly- 
ing sets {si, . . . , s„}. We represent inclusion Si S2 by an 
arrow from Si to S2. We represent a split view Q G Split(S') 
where Q — {Si, . . . , S n } with a circle connected with undirected 
edges to Si, . . . , S„ and an arrow leading to S. We represent a 
complete view similarly, using a filled square instead of a circle. 
For each node S G § we indicate its cardinality bounds by anno- 
tating the node with [a..b] where a — Clnf(S'), b — CSup(S). 
We represent £(S) — {±Pi, . . • , ±P„} by annotating S with 
±Pi, . . . , ±P n . We represent d = { Si, . . . , s n } by annotating 
the node {si, . . . , s n } with 0^. 

DEFINITION 2 (Semantics of i-diagrams). An interpretation o/SN 
and PN is a triple (A, a, H) where 

• A is a finite set (the universe); 

• a : SN — > 7- > (A) specifies the values of sets; 

• 5 : PN — » V(A) specifies the values of unary predicates; 

An interpretation I is a model for an i-diagram 23, denoted I \= 23, 
ijf^ts G 0d.a(s) = 0, and for all S G § where S = {si, . . . , s n }, 
the following conditions hold: 
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• a(si) = ... = a(s n ); 

def 

accordingly, define a(S) = a(si) = . . . = a(s„) 

• Clnf(S) < \a(S)\ < CSup(S) 

• VP. (+P) G $(5) => a(5) C S(P) 

• VP. (-P) G $(S) a(S) C H(P) C 

• VS" G Sons(S). a(S') C5(S) 

• VQ G Split(5).V5i,& eQ.Si ^ S 2 a(Si)na(S 2 ) =0 

• VQ G Comp(S).a(S) C Ua l60 3(Si) 

We t«e f/ie standard notions of satisfiability, subsumption (entail- 
ment), and equivalence: 



T> is satis fiable 

v = v 



31. I \= V 

VP I \= V => I \= V 
V |= V A V |= V 



DEFINITION 3 (Explicit Disjointness). 

We write disjj,^ [Si , S 2 ) as a shorthand for 

Si / S 2 A 3Q G Sons(So). Si,S 2 G Q 

and we say that Si, 5*2 are explicitly disjoint, and we write 
6\s&{Si,S 2 )iff 

3S[,S' 2 , So G S, Si A si a S 2 A S 2 A disj PiSo (Sl, S 2 ) 

LEMMA 3. I-diagrams have the same expressive power as CBAC 
constraints. 

By "same expressive power" we here mean that there is a natural 
pair of mappings between the models of i-diagrams and solutions 
to CBAC constraints. 

Because nodes in i-diagrams are collections of set names, we 
can define the following operations. 

DEFINITION 4 (Factor-i-diagram). Let p C § x S be an equiv- 
alence relation on nodes. We define D / p as follows. Define 
_L d /p =J_ d . LetV = (S, d , Sons, Split, Comp, Clnf, CSup, $). 
We define V/p = V = (§', Sons', Split', Comp', Clnf, CSup', 
$') as follows. Define h so that if {Si, . . . , S„} is the equivalence 
class of S under p, then h(S) = Si U . . . U S„. IfQ^S, define 
h[Q] = {h(S) S G Q}. Then let §' = h[S]. Consider S' G §'. 
Both § and §' are partitions, and given S' G S there is a unique 
set {Si, . . . , S„} CS such that S' = Si U . . . U S„- Then define: 

Clnf'(S') =max(Clnf(Si), . . . , Clnf(S„)) 
CSup'(S') = min(CSup(Si), . . . , CSup(S„)) 

Sons' (S") = ft[Sons(Si) U . . . U Sons(S„)] 
*'(S') = *(Si)U...U$(S„) 

Split'(S') = {h[Q} | Q G Split(Si) U . . . U Split(S„)} 
Comp'(S') = {h[Q] | Q G Comp(Si) U . . . U Comp(S n )} 

DEFINITION 5 (Merge). For any i-diagram T> we define the i- 
diagram P[Merge(Q)] d = T> / p for the equivalence relation p = 

{{Si, s 2 ) I Si,s 2 g Q}u {(s,s) | sgs} 

In the sequel we impose the following restrictions on the form 
of i-diagrams. 

DEFINITION 6 (Simple Diagrams). A diagram is T> is simple iff 
T> = d or all of the following conditions hold for all S G §: 

a) (S, ~*) has no cycles, in particular S (jL Sons(S) 

b) §t> Sons(S) 

c) £ Split(S) A0 £ Comp(S) 

d) VQ,Q'. Q G Split(S) AQ'CQ^Q'^ Split(S) 

e) VQ,Q'. Q G Comp(S) A Q' D Q => Q' g Comp(S) 
/) CSup(0 d ) = O,Sons(0 d )=$(0 d ) = 



proc Simplify(D) : 

1. use fixpoint iteration to compute p as 

the smallest equivalence relation such that: 

1.1. SiASaASa^Si => (Si,S 2 ) Gp 

1.2. (S, 0„) G p A Si A S =► (Si, d ) G p 

1.3. G Comp(S) => (S,0 d ) G p 

1.4. disj CiSo (Si,S 2 ) A(Si,S 2 ) Gp^ (S 2 ,0 d ) Gp 

1.5. disj CiSo (Si, S 2 ) A (So, Si) G p ^ (S 2 ,0 d ) G p 

1.6. QGComp(Si) A (VSGQ.(S,0 d )Gp) (Si,0 d ) G p 

2. © := 23/p 
■ Split(S) <-{Q-{MIQ G Split(S),S £ Q} 

Comp(S)^{Q - {0 d }|g G Comp(S),S £ Q} 
Sons(S) ^Sons(S) - {0 d ,S} 

: Split(S) <- Split(S) - {0} 

-{Q | 3Q' G Split(S),Q' 2 Q} 
Comp(S) <- Comp(S) - {0} 
-{Q | 3Q' G Comp(S),Q' C Q} 

: CSup(0 d ) ^0 
$(0 d ) 
Sons(0 d ) ^0 
Comp(0 d ) <- 
_ Split(0 d ) «-0 
6. return 7? 



.s'fc,: 



sss 



Where [a <— b] denotes the result of updating the component a of 
i-diagram T> with value b. 

Figure 6. Polynomial-time algorithm Simplify to compute an 
equivalent simple i-diagram 



Simplicity eliminates redundancy from diagrams, but does not re- 
strict their expressive power, as the following lemma shows. 

LEMMA 4. For every i-diagram T> we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm Simplify 
in Figure]^ 



5. Sources of NP Hardness and Definition of 
I-Trees 

The satisfiability of i-diagrams is NP-hard because i-diagrams have 
the same expressive power as CBAC constraints. We have observed 
that the general directed acyclic graph structure of i-diagrams al- 
lows us to encode NP-complete problems; this motives the follow- 
ing two restrictions. 

Definition 7. 

An i-diagram D is tree shaped iff 

(S, ~~») is a tree {with an additional isolated node d ) 

An i-diagram T> has independent views iff 

for all Qi,Q 2 G Split(S) U Comp(S) at least one of the following 
two conditions holds: 

• Qi nQ 2 = 

• Qi G Split(S) A Q 2 G Comp(S) AftC Q 2 . 

Recall that, by Lemma [4] it suffices to consider i-diagrams with 
acyclic graphs of the subset relation. The tree shape condition is 
then a natural next restriction on the structure of i-diagrams. How- 
ever, due to the presence of Split and Comp, the tree shape condi- 
tion by itself does not reduce the expressive power of i-diagrams, 
and further restrictions are necessary. The independent views con- 
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dition extends the tree condition to the entire graphical representa- 
tion of i-diagrams, including the circles and squares that represent 
Split and Comp views. The conjunction of these two conditions 
can be expressed by saying that the graphical representation of i- 
diagram is a tree. 

REMARK 1. We can express the combination of the conditions: 
being simple, being tree shaped, and having independent views by 
saying that there are only four kinds of edges in the corresponding 
graphical representation:" 

• from an element S £ §— {0d} to a circle 

• from a circle to a square, indicating that all nodes of a split view 
belong to a complete view 

• from a circle to an element S £ §— {0d}, 

• from a square to an element S £ {0d}. 

Unfortunately, the restrictions on tree shape and independent views 
are not sufficient to guarantee a polynomial-time decision proce- 
dure in the presence of predicates associated with nodes. The rea- 
son is that the ability to encode disjointness of arbitrary sets leads to 
NP-hardness, yet even with tree structure and independent views it 
is possible to assert that two arbitrary sets Si and S2 are disjoint by 
letting (+P) £ <£>(Si) and (— P) £ $(£2) for some uninterpreted 
predicate P. A simple way to avoid this problem is to require that 
$ contains only positive atoms (+P). A more flexible restriction 
is the following. 

DEFINITION 8. An i-diagram T> has independent signatures iff 

for every pair of distinct nodes Si, S2 such that (— P) £ &(Si) 
and(+P) £ $(S 2 ) for some P £ PN, at least one of the following 
two conditions holds: 

1. Si and S2 are explicitly disjoint, that is, disjJ,(Si, 52) 

2. Si and S2 have compatible signatures, that is, there exists a 
node S such that 

Si A S A S 2 S A 
Sig(Si)nSig(S 2 ) C Sig(S) 

where Sig(S) = {P \ (+P) £ $(S) V (-P) £ $(S)}. 

The independent signatures condition ensures that any disjointness 
conditions are either 1) a result of the fact that the ancestors of 
Si and S2 are explicitly stated as disjoint, or 2) a result of a 
contradictory predicate assignment (the case when Si and 52 have 
compatible signatures, so there exists a parent that resolves which 
of (+P) or (-P) hold for both Si and S 2 ). 

The discussion above leads to the definition of i-trees, for which 
we will give polynomial-time algorithms for satisfiability and sub- 
sumption in Sections|6|and0 

DEFINITION 9 (i-trees, iT). An i-tree T is a simple i-diagram 
such that T =_l_d or such that all of the following three conditions 
hold: 

1. T is tree shaped 

2. T has independent views 

3. T has independent signatures. 

We denote by iT the set of i-trees. 

The following theorem justifies why all three conditions in our def- 
inition of i-trees are necessary. Its proof is based on a reduction 
from graph 3-colorability, which can be encoded using slightly dif- 
ferent i-diagrams for each of the three cases. The common property 
of these diagrams is that they can encode disjointness of arbitrary 
pairs of nodes. 

2 As a result, we can recognize this structure in linear time using, for 
example, a tree-automaton 1121 . 



THEOREM 2. Omitting any one out of three conditions from Defi- 
nition\§\vields a class of diagrams whose satisfiability is NP-hard. 

We note that in addition to NP-hardness, the omission of tree 
shaped or independent views properties in fact retains the full 
expressive power of CBAC constraints, using a similar argument 
as in Lemma[3] 

Our ability to specify i-trees as a natural subclass of i-diagrams 
justifies the definition of i-diagrams themselves. For example, the 
definition of i-trees would have been more complex had we chosen 
to represent disjointness using a binary relation si flS2 = 0. 

Let us also observe that, despite the imposed restrictions, i- 
trees are fairly expressive. In particular they can express hierar- 
chical decomposition of a set given by a node S into disjoint sets 
Si, ... , S„, by letting {Si, . . . , S„} £ Split(S) n Comp(S). De- 
spite the independent view condition, we can have multiple orthog- 
onal decompositions, so {S[, . . . , S' m } £ Split(S) nComp(S) for 
{S[ , . . . , S' m } l~l{Si , . . . , Sn} = 0. This allows i-trees to naturally 
express generalized typestate constraints. 

6. Deciding the Satisfiability of I-Trees 

In this section we prove that the satisfiability of i-trees is decidable 
in polynomial time. For this purpos e we introduce a set of weak 
consistency conditions d (Definition ! 101 such that: 

<Q We can enforce weak consistency for any satisfiable i-tree us- 
ing a rewriting system TV" (Definition II It with the following 
properties (LemmafJJ: 

• TV" is semantic-preserving; 

• if a non-_l_d i-tree is in TV" normal form, then it satisfies 
weak consistency conditions; 

• for a particular strategy (Figure |5J the system TV" termi- 
nates in polynomial time. 

16.21 Every i-tree that satisfies weak consistency conditions is satisfi- 
able; Lemma[6]gives an algorithm for constructing a model for 
any i-tree that satisfies weak consistency conditions. 

Figure [5] summarizes the polynomial-time satisfiability decision 
procedure whose correctness (Theorem[3j follows from the results 
of this section. 

DEFINITION 10 (Weak Consistency). An i-tree satisfies weak 
consistency iff T and T satisfies the following conditions 

for all S £ S; 

VS' £ Sons(S). $(S') D $(S) (&) 

CSup(S) > => VP £ PN. {+P, -P} % $(S) (C 2 ) 

VQ £ Comp(S). CSup(S) < E(CSup[Q]) (C 3 ) 

VQ £ Split(S). Clnf(S) > E(Clnf[Q]) (C 4 ) 

Clnf(S) < CSup(S) (C s ) 

6.1 A Rewriting System TV" for Enforcing Weak Consistency 

We introduce the following rewriting system to enforce weak con- 
sistency properties when possible. 

DEFINITION 11 (System TV"). For each tuple (k, name, 
condition, effect) in Figure we define a rewriting rule on 
i-diagrams by 

V^T? 4^{V /_L d A condition A V' = ©[effect]) 

name 

for each assignment spot of the free variables appearing in the 
condition column. We define Rk by 

V^V' &> 3spot. V^V 

R k name 

We define TV" as union of > for 1 < j < 5. 
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k 


name 


conditions 


effect 


1 


DnPhi 


oi) S G Sons(5") 

6i) <p n = $(5") U ${S') 

ci) 0„ % ${S) 


$(S)<-^„ 


2 


Unsat 


as) C $(5) 

b 2 ) n = 

C2) Lbup(A) > n 


CSup(S') <— n 


3 


UpSup 


03) Q G Comp(S') 
63) n = E(CSup[Q]) 
C3) CSup(S') > n 


CSup(S)<-n 


4 


Uplnf 


a 4 ) Q G Split(S) 
64) n = E(Clnf [Q]) 
d) Clnf(S) < n 


Clnf(S)<-7i 


5 


Error 


a 5 ) Clnf(5) > CSup(S) 





Figure 7. System for ensuring weak consistency 




DnPhi Unsat UpSup Error 



Figure 8. An example sequence of rewriting steps for 72.™ 

Figure [8] shows an example sequence of rewriting steps applied to 
an i-tree. 

LEMMA 5 (Properties of TV). 

1. TV" is VT-stable, that is 

T G iT A T — ► T' => T' G iT 

2. TV preserves the semantics, that is 

V — ► V' V = V 

3. TV enforces weak consistency when possible, that is, a dia- 
gram T> in TV normal form is either equal to _Ld or it is weakly 
consistent 

4. TV terminates in polynomial time for the strategy correspond- 
ing to the algorithm Rj^p in Figure® 

Proof sketch. 

1. Follows easily from the fact that TV rules do not modify Sons, 
Split, Comp. 

2. Follows by construction of TV rules. Suppose T> — ► T>' . Then 

T> \= T>' follows from conditions ai (l<i<5), and T>' \= T> 
follows from conditions c; (l<i<4). 

3. For every k = 1..5, the condition of application of the rule Rk 
corresponds to the negation of Ck ■ When a diagram is in normal 
form for the rule Rk, it either satisfies Ck, or is J_d- 

4. To prove that Rj5J F corresponds to a polynomial strategy, we 
prove by induction that applying the rule Rk in the speci- 
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proc R^p (T) 

1. for every S G § from the root to the leaves 

for every Q G Comp(S') 

try to apply DnPhi(S', Q) to T 

2. for every S G § 

for every P G PN 

try to apply Unsat (5", P) to T 

3. for every 5 G § from the leaves to the root 

for every Q G Comp(S') 

try to apply UpSup (5, Q) to T 

4. for every 5 G § from the leaves to the root 

for every Q G Split(5) 

try to apply Uplnf (S, Q) to T 

5. for every S G S 

try to apply Error (S) to T 

return T 

proc ItreeSAT(T) 

if (Rn F (T) =J-d) return satisfiable 
else return unsatisfiable 



Figure 9. Polynomial-time algorithms Rfjp and ItreeSAT to 
compute TV normal form and check satisfiability of i-trees 

fied direction (from the root to the leaves or from the leaves 
to the root), enforces Ck everywhere, and when Ck holds, the 
rule is not applicable anymore. Finally, we prove that each 
rule Rk for k = 1..5 preserves the conjunction of properties 
A 3 =i (fc-i) an d as a conse q uence > we never need to reap- 
ply any of the rules Rj for j < k.m 

6.2 Constructing Models for Weakly Consistent I-Trees 

The following Lemma |S| is crucial for the completeness of our 
algorithm, and justifies the definition of weak consistency. 

LEMMA 6 (Model Construction). If an i-tree T is weakly consis- 
tent, then we can construct a model for T. 

The high-level idea of the proof of Lemma[6]is to first build the first 
two components (A, a) of the model, and then extend the model 
with 3 using the independent signatures condition for i-trees. We 
build the (A, a) part of the model by building a model for each 
subtree using an induction on the height of the i-tree. To construct 
models that satisfy Split and Comp constraints in the inductive 
step, we use a stronger induction hypothesis: we show that there 
exists a model (A, a) for a tree rooted in node S with |A| = k 
for all Clnf(S) < k < CSup(S'), and we rely on the properties 
of weak consistency to prove the inductive step. The proof of this 
lemma is interesting because similar ideas are used when building 
example models that show the completeness in Section^ 

Putting all results in this section together using the argument at 
the beginning of the section, we obtain the following theorem. 

THEOREM 3 (ItreeSAT Correctness). T is satisfiable if and only 
(/'Rnp(T) ^A-d- Therefore, the algorithm ItreeSAT in Figure^is 
a sound and complete polynomial-time decision procedure for the 
satisfiability of i-trees. 

7. Deciding Subsumption of I-Trees 

The goal of this section is to prove that we can decide the subsump- 
tion of i-trees in polynomial time. Note that the subclass of i-trees 
is not closed under negation or implication, so we cannot decide 
T |= T' by checking the satisfiability of ->(T => T'), Instead, 
our approach is to bring T into a form where the properties of the 
models of T are easy to read from T. We then check that T en- 
tails each of the conditions that correspond to the semantics of T' . 
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k 


name 


condition 


effect 


6 


Dnlnf 


06 ) ({S}l±lQo)eComp(S") 
be) n = Clnf(5") 

-S(CSup[Q ]) 
c 6 ) n > Clnf(S') 


Clnf r (S)<-n 


7 


DnSup 


o 7 ) ({S , }tt]Qo)GSpht(S') 
67) n = CSup(5") 

-E(Clnf[Q ]) 
C7) n < CSup(S') 


CSup(S)«-n 


8 


CCmp* 


as) Q G Split(5) A 

CSup(S)<E(Clnf[Q]) 
6 8 )C„ = Comp(S)u{Q} 
c 8 ) C„ g Comp(S') 


Comp(S)^C„ 


9 


CSplit* 


09) Q G Comp(5) A 

Llnt^o ) > Zj^oup[yjj 
69) C n = Split(S)u{Q} 
Cg) C„ % Split(S) 


Split(S)«-C„ 


10 


UpPhi 


aio) Q G Comp(S') 
6io)^n = $(S)un*[Q] 
cio) 0„ 2 $(5) 




11 


Void* 


an) S / d A CSup(S) = 


Merged, d }) 


12 


Equal* 


ai 2 ) {£'} G Comp(5') 


Merge({5, 5'}) 



'Follow the application of these rules by Simplify. 



Figure 10. Rules for System 1Z 

We formalize the intuitive condition of being easy to read in the 
notion of strong consistency. We build on the system 1Z W from the 
previous section to create a larger rewriting system 1Z for ensuring 
strong consistency. We introduce a polynomial-time strategy for 1Z 
that transforms every i-tree into J_d or into an i-tree that is strongly 
consistent, and we give polynomial-time algorithms for extracting 
the information from strongly consistent i-trees. 



DEFINITION 12 (Strong Consistency). An i-tree T is strongly 
consistent iff it is weakly consistent and satisfies all of the following 
properties: 



VQ G Comp^.VSo G Q. 

Clnf(So) > Clnf(S') - E(CSup[Q - 


{So}]) 


(Ce) 


VQ G Split(5).V5 G Q. 

CSup(So) < CSup(S') - E(Clnf[Q 


- {So}]) 


(Ct) 


VQ G Split(S). Q Comp(S') 
CSup(S) > E(Clnf [Q]) 




(Cs) 


VQ G Comp(S). Q Split(5) 
Clnf(S) < E(CSup[Q]) 




(C9) 


VQ G Comp(S). n(*[Q]) C $(5) 




(Cio) 


S j= %v CSup(S) > 




(Cn) 


Q G Comp(S') ^ |Q| > 1 




(C12) 



7.1 A rewriting system 1Z to enforce strong consistency 

This section follows the development of Section loTI 

DEFINITION 13 (System 11). The system TZ extends TZ W with the 
additional rules of Figure \10\ analogously to Definitional} 



LEMMA 7 (Properties of TZ). 1. TZ is YT-stable, that is 
T G iT A T — >T' =► T G iT 

TZ 

2. TZ preserves the semantics, that is 

V — > V' V EE V' 
TZ 



proc Rnf(^T) 

I. ..5.T^R^ F (T) 

6. for each S' 6 § from the root to the leaves 

for each Q G Comp(5) 
for every S G Q 

try Dnlnf (S, 5",Q) 

7. for each S' G § from the root to the leaves 

for each Q G Split(5) 
for each S G Q 

tryDnSup(5', S",Q) 

8. for each 5 G § 

for each Q G Split(S) 
try CCmpfS, Q) 

9. for each S G § 

for each Q G Comp(S') 
try CSplit(5,Q) 

10. for each S G § from the leaves to the root 

for each Q G Comp(S') 
try UpPhi(S', Q) 

II. for each S G § from the leaves to the root 

try Void(S) 
12. for each S G § 

for each Q G Comp(S') 
try Equal(5, Q) 

return T 



Figure 11. Polynomial-time algorithm Rmf(7") to compute TZ 
normal form 



3. TZ enforces strong consistency when possible, that is, a diagram 
T> in TZ normal form is either equal to or it is strongly 
consistent. 

4. TZ terminates in polynomial time for the strategy corresponding 
to the algorithm Rnf described in Fieure ffi] 

Proof sketch. 

1. The iT-stability is trivial for the rules Dnlnf, DnSup, 
UpPhi. The other rules are marked with a star and we use the 
algorithm Simplify. In fact, we can show that it is not necessary 
to apply Simplify in its full generality, but only to remove any 
redundant views introduced by CCmp and CSplit, remove 
any self edges introduced by the operation Merge used in the 
rules Equal and Void, and to remove the edges going to 0d 
that can be introduced by the rule Void. 
2,3. Follow by construction as in the previous section. 
4. This part is significantly more difficult than for system TZ™ , 
because the interactions between the rules are more complex, 
but follows the same structure as the proof for TZ W . 

7.2 Extracting Information from Strongly Consistent I-Trees 

In this section we start from a strongly consistent i-tree T and con- 
sider the problem of checking T \= 2?'. Analyzing Definition [5] 
we observe that a diagram corresponds to a conjunction of con- 
straints. Therefore, the subsumption problem T \=T>' corresponds 
to the problem of verifying that T entails atomic formulas of the 
form s = 0, si = S2, si C S2, a < |sj < 6, s C P, s C P c , 
si PI s 2 = and s C |J{ s ii • • • > s «}- Without the danger of con- 
fusion, we write T \= A when the atomic formula A holds in all 
models for T. 

THEOREM 4. Let T be a strongly consist ent i-tree and let Y\ A for 
atomic formula A be as defined in Fieure U2\ Then T \= A if and 
only ifW T A . 
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proc Subsumes(T, T>') 
T:= Rnf(T) 

let / : SN -> § such that Vs G SN. s G /(s) 

let ft' : §' — > SN be any function such that VS" G S. h'(S') G S" 

check all of the following conditions: 

1- A A hJ 1=S2 

SeS' si ,S2 £S 

where Hj 1=S2 4=4 /(si) = /(s 2 ) 

2 - H ft(0i)=0 



3- A H C | nf /( g - ) <| h(s) |< CSup /( S - ) 

se§' 

4^4 Clnf(/(s)) < a < b < CSup(/(s)) 

iT 



where H^l 4^ /(s) = io c / 

where H^<| s |< b , 

4- A A H Ip(MS)) 

ses' (+p) 6 $'(s) 

where Hl P{s) 4=4 (+P) G <£>(/(*)) 

5- A A H -P(h(S)) 
ses' (-p)g«'(s) 

where HT P(s) 4=4 (-P) G *(/(*)) 

6 - A A H h(S)Ch(s') 
ses' s'eSons'(S) 

4=4 /( S i) = d v/( Sl )A/( S2 ) 

A H h(Si)nh(S 2 )=B 



where Hj lQs2 

7. A A" 

SeS'QeSplit'(S) Sl ,s 2 ec 
Si#S 2 

4=4 /(s 1 ) = d V/(s 2 ) = ci V 
disjr(5 , i,5 , 2 ) 



2 = 



8 



where H^ ns2= 

A A n h(S)Cuh[Q] 

ses' QeComp'(S) 



sss' QeCom P '(S) 

where Hj Quz 4=4 f{s) = d V 

where proc Included (So, C, T) 
return V Incl(S') 

So -* 



Included(/( S ),/[Z],T) 



proc Incl(S') 

if S G C then return true 

else return V (A Incl(S")) 

QeCom P (S) V s'eQ ) 

Figure 12. An Algorithm for Computing T \= V for a an i-tree 
T and an arbitrary diagram D 1 . 



It is easy to verify that H \ implies T \= A. The proof of the 
converse is based on the following two lemmas, which provide a 
link between strong and weak consistency. 

LEMMA 8 (Bounds Refinement). Let T be a strongly consistent i- 
tree, S G S, i,s such that Clnf(S') < i < s < CSup(S'), let 
T = T[C\nf(S) <- i, CSup(S') <- s] and% F = Rft F (T'). Then 
1) 1~nf /J-d, 2) 7^' F |= T, and 3) if^(S ~* * So), then 

(Clnf(S„),CSup(5 )) = (Clnf(g F (So),CSup' NF (S )). 
Proof sketch. 

1. We prove this result by induction on the depth of S in the tree 
(§, ~-»). The key step of this proof is to show that the application 
of UpSup and/or Uplnf to the father S" of S does not produce 
a situation where as holds in the resulting diagram T" (and 
therefore the rule Error is not applicable in T"). We use the fact 
that R^ F applies the rules Uplnf and UpSup bottom up, and 
prove that each application preserves Cs, only increases Clnf 



and only decreases CSup. At each step we distinguish three 

cases: 

(a) both UpSup and Uplnf are applicable; then the result fol- 
lows from C5; 

(b) only UpSup is applicable; then the result follows from Co', 

(c) only Uplnf is applicable; then the result follows from C7. 

2. Follows easily from the hypothesis Clnf(S) < i < s < 
CSup(S) and the fact that R^ F is semantics preserving. 

3. It is enough to notice that only rules Uplnf and UpSup are used 
when applying R]\J F , and these rules are applied in the bottom- 
up direction. ■ 

The fact that the resulting i-tree 7^ F is not strongly consistent any- 
more prevents us to apply this lemma twice from a given strongly 
consistent i-tree. To enforce more than one restriction, we need to 
refine simultaneously the bounds of several nodes. For this purpose, 
we use the following lemma. 

LEMMA 9 (Parallel Bounds Refinement). Let T be a strongly con- 
sistent i-tree, and (Qo, "•"*) o. subtree ofT such that 

• The nodes ofQo are pairwise independent, that is, 

VS 1 ,S 2 eQ .^(d\ss* T (S 1 ,S 2 )) 

• (Qo, has the same root as T. 

Then the i-tree T defined by the simultaneous update 

T' d = f T[ VS G Q :Clnf(S , )^CSup(S') ] 
is such that its TV" normal form T^ F = Rj5J F (T') satisfies 

2. T N ' F h= T 

Lemmas [8] and [5] are the basic tools we need to show that 
the information syntactically computed from an i-tree is the most 
precise information computable from the semantics of the i-tree. 
We prove this property for each of the atomic formulas A. 

LEMMA 10. If an i-tree T is strongly consistent, then for all S G § 
we have 



3M. M \= T Aa M (S) ^ > 



Proof. If 5 7^ 0d, we have CSup(S') > by Cn and therefore 

the i-tree T' d = T[Clnf (S)<- max(l, Clnf (5))] subsumes T. By 
Lemma[8| T is satisfiable, and we can take any model of T' as a 
model of T. ■ 

LEMMA 1 1 . If an i-tree T is strongly consistent, then for all S G S 
we have 

3M.M |=TA|S^(5)| = Clnf(S) 
3M.M h T A \a M {S)\ = CSup(S') 

Proof. According to Lemma [8] the two i-trees 

T{ = / T[CSup(S)^CInf(5)] and T 2 ' = / T[Clnf(S)<-CSup( 1 S')] 
are satisfiable, and both T[ and T 2 ' trivially subsume T. Any model 
Mi of T{ is such that \aMi (S)\ = Clnf (S), and any model M2 
of T 2 ' is such that \aM 2 = CSup(5). ■ 

LEMMA 12. If an i-tree T is strongly consistent, So G §, C G 
V(S), and Included(5'o, C, T) returns false, then 

3M.M\=T Aa M (So) <l\Ja M [C] 

Proof sketch. Assume that Included returns false. We argue 
that the model M exists in several steps. Let Qo be the smallest 
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set of nodes such that: 

So As ^SeQo 
S G Qo A Ci G Comp(S)\ _^ „ _ n 
A Si G Ci A ^lncl(Si) J 1 y ° 

By definition of Incl, we have Qo DC = 0. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We therefore compute a subtree Qi of 
Qo, by starting from the root and keeping at most one son for each 
complete view. In this process we ensure that Qi contains So, by 
avoiding to cut the branch which leads to So . 

We then apply Lemma [5| to Qi and construct a model of the 
resulting i-tree while enforcing that a certain element x G «(S) is 
such that x G a(S') 4» S' G Qi for all S' G S. More precisely, 
we prove by induction on n that for each node Si of Qi of depth n 
in the tree Qi we can construct a model (A, a, S) for the sub-i-tree 
of T with root Si such that 

VS'. S' A Si (x € q(S') ^> S' G Qi) 

If n = then Si is a leaf of (Qi , ~-») and has no complete view by 
construction of Qi. Then using Cs we show that we can construct 
a model of the sub-i-tree with root Si containing a fresh element 
(not included in any of the sons of Si). 

If n > 0, we can deal with the split views in the same way, but 
this time Si can have some complete views. If this complete view 
contains a unique split view, we avoid merging a; in a son of Si with 
elements in the other sons of Si . If there exist more than one split 
view, we can use Cs and construct the model using a refinement of 
the ideas of Lemma[6] 

Finally, since So G Qi, we have x G a(So), and since 
C n Qi = we have x <£ Q a[C]. m 

LEMMA 13. If an i-tree T is strongly consistent, for all Si, S 2 G S 
such that Si 7^ and S 2 7^ we have 

-n(Si AS 2 ) 3M. M \=TAa M (Si) %a M {S 2 ) 

Proof. The property 3M. M \= T A cIm (Si) g om(S 2 ) can 
be checked using Included(Si, {S2}, T). Using C12 we show 
that this test is equivalent to test Si S2. ■ 

LEMMA 14. If an i-tree T is strongly consistent, for all Si, S2 G § 
we have 

S 1 ^S 2 ^3M. M Aa M (Si) ^a M (S 2 ) 

Proof. If Si / S 2 , then -.(Si A £V) or ^(S 2 A Si). In either 
case the result follows from Lemma lT31 ■ 

LEMMA 15. If an i-tree T is strongly consistent, then for all S G § 
and P G PN we have 

(+P) 0$(S) 3M. M ^TAa M {S) g E(P) 
(-P) $(S) 3M. M \= T A a M (S) g E(P) C 

Proof sketch. Let S G § be such that (+P) <£ $(S). We define 

Q P d = {S' G S\(+P) G *(S')}. Using C10 and d we show that 
Included(S, Qp, T) returns false. By Lemma 1121 there exists a 
model such that a(S) g (J a[Qp]. We then change the model by 
redefining E' on PN as H'(P) = \Ja[Q P ], so a(S) g E'(P). 
The case (— P) ^ ^(S) is dual and follows from the previous case 
by swapping (+P) and ( — P) in the i-tree and taking complements 

0fS(P).B 

LEMMA 16. If an i-tree T is strongly consistent, then for all 
Si, S2 G § such that Si 7^ 0<j, S2 7^ 0d we /iflve 

-n(disj^(Si,S 2 )) ^ 3M. M KASm(Si) n5 M (S 2 ) / 0. 



From the previously stated lemmas, we can prove Theorem |4] 
From Theorem [4] and Lemma^we conclude that the algorithm in 
Figure fT2l is a correct and complete test for subsumption, not only 
of between trees, but also between a tree and an arbitrary diagram. 

8. Related Work 

Boolean algebras with cardinalities. Quantifier-free formulas of 
boolean algebra are NP-complete 1331 . Quantified formulas of 
boolean algebra are in alternating exponential space with a lin- 
ear number of alternations 1211 . Cardinality constraints naturally 
arise in quantifier elimination for boolean algebras 1311 1421 1431 . 
Quantifier elimination implies that each first-order formula of the 
language of boolean algebras is equivalent to some quantifier-free 
formula with constant cardinalities; however, quantifier elimina- 
tion may introduce an exponential blowup. The first-order theory 
of boolean algebras of finite sets with symbolic cardinalities, or, 
equivalently, boolean algebras of sets with equicardinality operator 
is shown decidable in [14|. These results are repeated, motivated 
by constraint solving applications, in [23 39 1 and a special case 
with quantification over elements only is presented in |44|. Upper 
and lower bounds on the complexity of this problem were shown in 
1 22 1 which also introduces the name BAPA, for Boolean Algebra 
with Presburger Arithmetic. The quantifier-free case of BAPA was 
studied in |45 | with an NEXPTIME decision procedure, which is 
also achieved as a special case of 1231 1221 . The new decision pro- 
cedure in the present paper improves this bound to PSPACE and 
gives insight into the problem by reducing it to boolean algebras 
with binary-encoded large cardinalities, and showing that it is not 
necessary to explicitly construct all set partitions. 

Several decidable fragments of set theory are studied in 1 10 1. 
Cardinality constraints also occ ur in description logics |5| and 
two- variable logic with counting 1351 1 1 91 138 1. However, all logics 
of counting that we are aware of have complexity that is beyond 
PSPACE. 

We are not aware of any previously known fragments of boolean 
algebras of sets with cardinality constraints that have polynomial- 
time satisfiability or subsumption algorithms. Our polynomial-time 
result for i-trees is even more interesting in the light of the fact that 
our constraints can express some "disjunction-like" properties such 
as A = B U C. 

Set constraints. Set constraints Q |5| |2| |5J are incomparable to 
the constraints studied in our paper. On the one hand, set constraints 
are interpreted over ground terms and contain operations that ap- 
ply a given free function symbol to each element of the set, which 
makes them suitable for encoding type inference 1 4 1 and interproce- 
dural analysis 1 20 34 ] . Researchers have also explored the efficient 
computation of the subset relation for set constraints [13 j. On the 
other hand, set constraints do not support cardinality operators that 
are useful in modelling databases 1 40 1 1 1 and analysis of the sizes 
of data structures 1 29 1. Tarskian constraints use uninterpreted func- 
tion symbols instead of free function symbols and have very high 
complexity 1181 . 

9. Conclusions 

Constraints on sets and relations are very useful for analysis of soft- 
ware artifacts and their abstractions. Reasoning about sets and re- 
lations often involves reasoning about their sizes. For example, an 
integer field may be used to track the size of the set of objects stored 
in a data structure. In this paper, we have presented new complexity 
results and algorithms for solving constraints on boolean algebra 
of sets with symbolic and constant cardinality constraints. We have 
presented symbolic constraints and large constant constraints, gave 
more efficient algorithm for quantifier-free symbolic constraints, 
identified several sources of NP-hardness of constraints, and pre- 
sented a new class of constraints for which satisfiability and en- 
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tailment are solvable in polynomial time. We hope that our results 
will serve as concrete recipes and general guidance in the design of 
algorithms for constraint solving and program analysis. 
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A. Proofs 
A.l I-Diagrams 

Lemma l3l I-diagrams have the same expressive power as CBAC 
constraints. 

Proof. We translate an i-diagram into a CBAC constraint as fol- 
lows. As in Figure |2| we note that b 1 = 62, &i C & 2 can be ex- 
pressed in the form |6| =0, so we may assume that they are part of 
CBAC. Similarly, \b\ < k can be expressed as b C s A \s\ — k 
for a fresh variable s, and |6| > k can be expressed as s C 
b A \s\ = k. We translate _l_d into e.g. |0| = 1. Next consider T> = 
(§, 0ti, Sons, Split, Comp, Clnf, CSup, $). For each S G §, let 
»?(S) € 5 be a representative set name. For each Si £ S \ {ri(S)} 
introduce conjunct 5*1 = ri(S). Next, for each Si G Sons(S), in- 
troduce a conjunct 5*1 C S. For each +P G $(5'), introduce con- 
junct SCP, and for each — P G <1?(S) conjunct S C P c . Express 
the bounds using conjuncts |S| < CSup(S) and S > Clnf(S). 
For each Q G Split(S) and Si,S2 G Q where Si 7^ S2, intro- 
duce conjunct |Si n S2 =0. For each {Si, . . . , S n } G Comp(S) 
introduce conjunct S = Si U . . . U S n . 

We translate a CBAC constraint into i-diagram using the follow- 
ing observations. It is sufficient to translate the following boolean 
algebra expressions: so = si U S2, so = Si, and \s\ = k. We 
construct an i-diagram whose nodes are singletons. We pick one 
set variable u to act as a universal set and put {s} G Sons({u}) 
for every set variable s in the i-diagram. We translate so = si U 
S2 as {{si},{s2}} G Comp({so}) and translate so = Si as 
{{so},{si}} G Split({u}), {{s },{si}} G Comp({w}). We 
translate |s| = k as Clnf({s}) = k and CSup({s}) = k. Then 
for each satisfiable assignment of CBAC there is a model for the 
constructed i-diagram where {it} is interpreted as a universal set. 
Conversely, for a model of i-diagram where q({s}) = A, we let 
[s h /Id a(u)]. The result is an assignment that satisfies the orig- 
inal CBAC formula. ■ 

Lemma |3| For every i-diagram T> we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm in Figured 

Proof. We argue that algorithm in Figure|6|produces diagram that 
is i) well-formed, ii) simple, Hi) equivalent to the original diagram. 

We first observe that after step 2, the following two conditions 
hold: 

C) if Q G Split(S) and S G Q, then Q C {S, d }. This condition 
holds because the step 1.5 of the algorithm merges all nodes 
Q \ {S} with d when S G Q G Split(S). 

D) if Q G Comp(S), then Q / (by step 1.3), if Q = {0 d } then 
S = d (by step 1.6). 

i) To see that the resulting i-diagram is well-formed, it suffices 
to check the conditions (J Split(S) = Sons(S) and \J Comp(S) C 
Sons(S). This condition is preserved by factor-diagram construc- 
tion (for any equivalence relation). It is preserved by step 3 for the 
following reason. The only nodes removed from Sons(S) are 0d 
and S. These nodes do not appear in (J Comp(S) because 0^ is 
removed from each view Q, and views with S G Q are removed. 
It remains to check that Sons(S) C (J Split(S) after step 3, and 
this holds because our condition (C) implies that no element other 
than S, 0d is lost from (J Split(S) in step 3. The well-formedness 
condition is preserved by step 4 because this step does not change 
[J Comp(S) or (J Split(S). Step 5 does not violate this condition 
either because it sets the components of d to 0. 

ii) To see that the resulting diagram is simple, we show that 
it satisfies conditions a),... ,f) of Definition [6] After step 2 of the 
algorithm, the resulting factor-diagram has no cycles of length 2 
or more, there are only potentially some self-cycles. These are 
eliminated in step 3 and no further edges are introduced. Hence, 
a) holds. For each of the following condition, they are enforced in 



certain step and not violated afterwards, according to the following 
table: 



b) 


c) 


d) 


e) 


/) 


3. 


4. 


4. 


4. 


5. 



Hi) We show that semantics is preserved when executing each 
sequence of steps 1, . . . , k for 2 < k < 5, that is, each step pre- 
serves the semantics provided that it is executed after the previous 
steps. 

k — 2. Each equality introduced into p is a semantic conse- 
quence of the diagram, because 

1.1 a(Si) C S 2 and q(S 2 ) C 5(Si), 

1.2 a(Si) C 5(0,0 = 0, 

1.3 a(Si) C[J0 = 0, 

1.4 a(Si) n a(S 2 ) = for a (Si) = 5(S 2 ) so 5(S 2 ) = 0, or 

1.5 a(Si)na(S 2 ) = 0, for a(Si) = a(S ), and a(S 2 ) C5(S ), 
so again 5(S 2 ) = 0, 

1.6 5(Si) C Sando(S) C (J{5(Si)} = 5(Si). 

It follows that the condition on equality of sets, as well as the con- 
ditions on Clnf, CSup, Sons, $, Comp are all semantically equiv- 
alent when applied to the original and the factor diagram. The only 
semantic condition which can be lost in factor-diagram construc- 
tion is disjj-, Sq (Si, S2) when Si and S2 nodes are merged, that is, 
when (Si, S2) G p. However, in this case the disjointness condi- 
tion follows from a(Si) = 0, which is enforced in 1.3. Therefore, 
for the particular relation constructed in step 1, factor-diagram is 
an equivalence preserving transformation. 

k = 3. We need to show that no information is lost by removing 
0d and S from the sons, as well as split and complete views of 
S. Clearly, removing S and 0^ from Sons(S) does not change 
the subset conditions because C a(S) and a(S) C 5(S). 
Eliminating 0^ from Q G Comp(S) is justified because the view 
has the same semantics with or without 0^. Dropping a view Q G 
Comp(S) for S G Q is justified because in that case a(S) C 
UsieQ <*(Si) holds trivially. Eliminating 0^ from Q G Split(S) 
is justified because intersection with empty set is always empty, 
so this condition does not bring any new information. Finally, 
dropping a Q G Split(S) with S G Q is justified because condition 
(C) implies that in such case Q C {0d, S} so the Split condition is 
trivial. 

k = 4. Removing {0d} from Split(S) preserves semantics 
because such view carries no information. Similarly, because all 
maximal views are preserved, removing their subsets does not 
change the semantics. For Comp(S), we consider two cases. In first 
case $5 Comp(S). In this case, removing does not have any 
effect, and it is sound to remove all non-minimal views because 
they are implied by the minimal views. The second case is G 
Comp(S). By condition D) on the step 1, we know that Q 7^ after 
the step 2, and the only node removed in step 3 is 0d, so it must have 
been the case that Q = {0d} after step 2. By condition D), we then 
have S = 0<j. Because the semantic condition on Comp for Q — 
reduces to 5(S) = 0, this condition brings no new information, so 
we can remove it. 

k = 5. Because 5(0<j) =, CSup(0d) = does not change 
semantics, similarly for $(0d) = 0. We also know that Sons(S) C 
{0d} because this condition is ensured by step 2 and is not violated 
afterwards. Because we have already observed that the diagram 
is well-formed, we conclude Comp(S) C {0d} and Split(S) C 
{0d}, so setting these values to does not change the semantics. ■ 
Theoreml2l Omitting any one out of three conditions from Defini- 
tion^(l. being tree-shaped, 2. having independent views, and 3. 
having independent signatures) yields a class of diagrams whose 
satisfiability is NP-hard. 

Proof. Suppose that at least one of the three conditions does not 
apply to a class of i-diagrams. We then give a reduction from the 
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problem 3COL to the satisfiability of i-diagrams in this class. Here 
3COL denotes the NP-complete problem of deciding, given an 
indirected graph, whether the graph can be colored using 3 colors 
such that adjacent nodes have different colors |41 Page 275]. 

Given a graph (TV, P) where E C TV x TV is a symmetric ir- 
reflexive relation, we first build the i-diagram T> defined as follows: 
§ = TV U {£/}, Clnf(P) = CSup(P) = 3, Sons(P) = TV, 
Split(Ef) = {{n}\n G TV}, and Comp(P) = $([/) = 0. For 
all n G TV we let Clnf(n) = CSup(ra) = 1 and let Sons(n) = 
Split(ra) = Comp(7i) = $(71) = 0. 

Each model of this diagram is a triple (A, a, S) such that 
|q((7)| = 3 and for all 71, a(n) is a singleton included in a(U). 
If we consider a(U) as a set of three colors, then in each model 
with this property, a(n) indicates the color of node n. 

Then for each edge (711,712) G E we encode the fact that 
m and 712 must have different colors by enforcing the property 
a(ni) (~l 0(712) = on the models of T>. We encode this constraint 
in different ways depending on the class of i-diagrams: 

• If T> allows dependent signatures, we introduce a fresh pred- 
icate symbol P nit n 2 , and add (+P ni ,n 2 ) to <I>(ni), and 
(--Pm.na) to $(n 2 ). 

• If T> allows dependent views, we add {711, 712} to Split((7). 

• If T> allows multiple fathers, then we simulate depen- 
dent views by introducing a new node m(ni,7i 2 ). 
We let Clnf (771(711, 712)) = CSup(m(7ii, 712)) = 2, 
Sons(m(ni, 712)) = {771,712}, Split(m(ni, 712)) = 
Comp(m(ni, n 2 )) = {{711,712}}, and <&(m(rai, 712)) = 0. 
We then remove ni,n 2 from Sons(P), and add 771(711,712) to 
Sons(P) instead. 

None of these constructions violates more than one of the three con- 
sidered restrictions. It is straightforward to verify that the diagram 
is satisfiable iff the graph is colorable, and that the construction of 
T> can be done in polynomial time. This proves that the satisfia- 
bility of i-diagrams with any of the three restrictions removed is 
NP-hard. ■ 

A.2 Termination of System 1Z 

LEMMA 17 (Invariants of H). For every k G [1.-12] the rule Rk 
preserves Aj=i (fc-i) ^ or re t urns -L<J- 

Proof. We analyze each rule Rk , for two i-trees T and T such 
that T — >T , assuming that T satisfies A J= i (&— 1) Pj» ant ^' 

more precisely, T-^—*T', where spot are variable names as they 

-Re- 
appear in the definition of 1Z. 

1. (DnPhi). Trivial. 

2. (Unsat). (Ci) does not depend on CSup. 

3. (UpSup). If n = 0, (C 2 ) is trivialy true for 5" in V . If 
71 > 0, by (C3) CSup(S') > and we have already VP G 
PN.{+P, -P} g $(S) and since $' = (C 2 ) holds in T . 

4. (Uplnf). Neither (Ci),(C 2 ) nor (C 3 ) depend on Clnf. 

5. (Error). T' =L d 

6. (Dnlnf). Only (C 4 ) and (C 5 ) depends on Clnf. 

• (C5) is maintained for S in T' because, noticing that (C5) is 
maintained for S' 7^ S, we have 

Clnf (5) = Clnf (S") - E(CSup(Q )) 

<CSup(5')-S(CSup(Q )) (byC 5 ) 
<CSup(S) (byC 3 ) 

• To prove that (C4) is maintained in T we need to check 
that (C4) is maintained for S, which is trivial by (ce), and 
that (C4) is maintained for the father S' of S and the views 
Q G Split(S ,/ ) containing S. By property of independent 



views there exists only one such view Q — ({S} tbl Q' ) 
such that Q' C Qo and 

EClnf (Q) = Clnf'(5)+ECInf(Q|,) 

= (Clnf(5")-ECSup(Qo)) + ECInf(^) 
= Clnf(5')-ECSup(Q -Qo) 

-rE s46Q& (Clnf(SS)-CSup(S'o)) 
<Clnf(5')-ECSup(Q -Q[)) (byC 5 ) 
< Clnf (S') 
= Clnf (5') 

7. (DnSup). Only (C 2 ), (C 3 ) and (C 5 ) depend on CSup. (C 2 ) is 
maintained thanks to (C7) as for the case of rule UpSup. 

• (C5) is maintained for S in T' because, noticing that (C5) is 
maintained for S' / S we have 

CSup' (5) = CSup(5") - E(Clnf(Q )) 

> Clnf (fir*) - E(Clnf(Q )) (byC 5 ) 

> Clnf (S) (byC 4 ) 

• To prove that (C3) is maintained in T we need to check 
that (C3) is maintained for S, which is trivial by (07), and 
that (C3) is maintained for the father S' of S and the views 
Q G Comp(5') containing 5*. By property of independent 
views there exists only one such view Q — ({S} l±) Qo) and 

ECSup'(Q) = CSup'(S')+ECSup(Qo) 

= (CSup(S")-ECInf(Qo))+ECSup(S ) 
= Clnf(S')+E So6Qo (CSup(fi*o)-Clnf(S )) 
> Clnf (5') (byC 5 ) 
= Clnf'(S") 

8. (CCmp). Only (C3) and (Ce) depend on Comp. 

• (C3) is maintained for S, Q because 

CSup'(5) = CSup(5) 

<E(Clnf(Q)) (by a 8 ) 
<E(CSup(Q)) (byC 5 ) 
= E(CSup'(Q)) 

• (Ce) is maintained for S, Q and all So G Q because 

Clnf (5) = Clnf (S) 

<CSup(S) (byC 5 ) 
<E(Clnf(Q)) (bya 8 ) 
= Clnf(So) + E(Clnf(Q - {So})) 
< Clnf(So) + E(CSup(Q - {So})) (by C 5 ) 
= Clnf (So) + E(CSup'(Q - {S })) 

(Remark). If ever we use a s implification afterwards, as indi- 
cated by the star in fieure fTol it can only consists in removing 
a complete view Q' such that Q' C Q. This operation trivially 
maintains Aj=i (7) ^3 because in every properties of consis- 
tency where complete views appear they are universally quan- 
tified. 

9. (CSplit). Only (C 4 ) and (C 7 ) depend on Split. 

• (C4) is maintained for S, Q because 

Clnf'(5 , ) = Clnf(5') 

>E(CSup(Q)) (bya 9 ) 
>E(Clnf(Q)) (byC 5 ) 
= E(Clnf'(Q)) 

• (C7) is maintained for S, Q and all So G Q because 

CSup'(5') = CSup(5) 

>Clnf(S) (byC 5 ) 
>E(CSup(Q)) (bya 9 ) 
= CSup(So) + E(CSup(Q - {So})) 
> CSup(So) + E(Clnf(Q - {S })) (by C 5 ) 
= CSup'(So) + E(Clnf'(Q - {S })) 
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(Remark). If ever we use a simplification afterwards, as indi- 
cated by the star in figure ^| it can only consists in remov- 
ing a split view Q' such that Q' C Q. This operation trivially 
maintains A J= i g because in every properties of consistency 
where split views appear they are universally quantified. 

10. (UpPhi). Only (d) and (C 2 ) depend on $ 

• (Ci) is maintained for S and all S' € Q because 

$'(S) = $(S)ufl$(Q) (by 610) 

=®(s)ums')r\HQ-{s'}) 

c$(S) U$(S") 

C$(S') (byCi) 
= $'(S') 

• We can also prove that (C2) is maintained for S. Suppose 
there exists P in PN such that {+P, -P} G $'(S). Then 
we distinguish three cases. 

■ if {+P, -P} C $(S), by C 2 , CSup(S) = 

■ if {+P, -P} n $(5) = 0, then {+P, -P} g n HQ)- 
Then for all 5' G Q, CSup(S') = by C 2 . Then by C 3 , 
CSup(S) = 

■ if {+P, -P} n $(S) = ±P for one atom ±P G 
{+P, — P}- Then the opposite atom ^P belong 
H $(Q) and by &, ±P belong to f| *(<?)• By C 2 , each 
node S' £ Q is such that CSup(S') = and by C 3 , 
CSup(S) = 

In the three cases CSup'(S) = CSup(S) = 0. 

11. (Void). If S is the root,the resulting i-tree T is such that 
S' = {SN} = {0 d } and d is trivial for all i G [1..11]. 
Otherwize, we have to check that removing the node S from 
the sons of the father of S (as indicated in the step 3 of the 
procedure simplify) maintains A 3 =i va^i' ^ e denote by So 
the father of S and by Qo the split view of So containing S. If 
S is also contained in a complete view of So we denote by Co 
this complete view. 

• (Ci) and (C2) are trivially maintained. 

• (C 3 ) is maintained because, if Co exists and C' = Co — {S} 
is not empty 

CSup'(So) = CSup(S ) 

<ECSup(C ) (byC 3 ) 
= ECSup(C ) (by 012) 
= ECSup'(C ) 

• (C4) is maintained because, if Q' = Qo — {S} is not empty 

Clnf'(So)=Clnf(S ) 

>ECInf(Q ) (byC 4 ) 

>ECInf(Q ) 

= ECInf'(Q ) 

• (C5) is maintained for the node ®' d = S U 0d of T because 
by (012) and (C 5 ), Clnf(S) < CSup(S) = 0, by simplic- 
ity and (Cs), Clnf(0 d ) < CSup(0 d ) = 0, and therefore 
Clnf'(0' d ) = Maa;(Clnf(S),Clnf(0 d )) = < CSup'(0' d ). 

• (Ce) is maintained for So, Co and every Si G Co such that 
Si ^ S because 

Clnf (Si) = Clnf(Si) 

>Clnf(So)-E(CSup(C -{Si})) (byC 6 ) 
= Clnf(S )-E(CSup(C -{S}-{Si})) (by 01a) 
= Clnf'(S )-E(CSup'(C -{Si})) 



• (C7) is maintained for So, Qo and every Si G Qo such that 
Si ^ S because 

CSup'(Si) = CSup(Si) 

<CSup(S ) - E(Clnf(C - {Si})) (by C 7 ) 
< CSup(So) - E(Clnf(C - {S} - {Si})) 
= CSup'(So) - E(Clnf'(C - {Si})) 

• (Cs) is maintained for Qo because 

CSup'(So)=CSup(S ) 

>E(Clnf(Q )) (byC 8 ) 

>E(Clnf(Q -{So})) 

= E(Clnf'(Q )) 

• (Cg) is maintained for Co (if exists and C' = Co — {So} 7^ 
0) because 

Clnf'(S ) = Clnf(So) 

<E(CSup(Q )) (byC 9 ) 
= E(CSup(Q - {So})) (by an) 
= E(CSup'(Q )) 

• (C10) is maintained for Co (if exists and Co — Co — {So} 7^ 
0) because 

n < f'(a,)=n*(Co-{s }) 
cn$(c ) 

C$(S ) (byCio) 
= &(So) 

12. (Equal). Before to merge S and S' for {S'} G Comp(S) we 
have 

• Clnf (S) = Clnf (S') by C 4 and C 6 

• CSup(S) = CSup(S') by C 3 andC 7 

• $(S) = $(S') byCi andCio 

Therefore all the properties d for i = 1..10 are trivially 
maintained. 



A.3 Model Construction 

Lemma |S| (Model Construction) If an i-tree T is weakly consis- 
tent, then we can construct a model for T. 

Proof. Let T be a weakly consistent i-tree. 

We construct a model (A, a, H) by first constructing a partial 
model (A, a) for all parts of T except $, and then extending 
(A, a) with 3 to satisfy <E>. 

Constructing (A, a). We write (A, a) (= T to denote that A and 
a satisfy those conditions on §, Sons, Split, Comp, CSup, Clnf that 
do not mention $ in Definition^ To show we can construct (A, a) 
such that (A, a) |= T we prove by induction on n the following 
more general claim. 

CLAIM 1 . For every i-tree T of height n with root Sr: 

Vfc G [Clnf (Sh), CSup(Sh)]. 

3(A,a). (A, a)\=V A \a(S R )\ = |A[ = k 

If n = I, the claim holds by C5 taking 

A = a(&0 = {l,...,fc} 

For n>l, consider an i-tree T with root Sr and k G 
[Clnf (Sr), CSup(Sfl)]. By examining the constraints in T, we 
choose the cardinalities for subtrees of T, use the induction hy- 
pothesis to construct models for subtrees, and paste the models for 
subtrees into a model for T. We decompose this process into three 
steps: 
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1. For each C £ Comp(Sn), consider a subtree Tq built from T 
by removing all sons outside (J C. We construct a model Mc 
for Tq of cardinality fc. 

2. For each remaining Q £ Split(S'u) where Q 2 U Comp(Sn), 
consider a subtree Tq with the same root Sr but without the 
sons outside \J Q. We construct a model Mq for 7c of cardi- 
nality k. 

3. Because the constructed models have the same cardinality, we 
can easily merge them to obtain a model for T. 

Step 1. Let C £ Comp(Sii), and C C Split(Si?) such that 

C = UC. For each Q £ C, let tq d = mm(k, E(CSup[Q])). Using 
k > Clnf (Sr) and C4 we can show 

E(Clnf [Q]) <tq< E(CSup[Q]) (HI) 

To each node S £ Q we can therefore assign an integer K(S) £ 
[Clnf(S), CSup(S)] such that t Q = T,(K[Q]). Let T s be the sub- 
i-tree of T rooted at S, By induction hypothesis, let Ms be the 
model of Ts of cardinality K(S). We can then take the disjoint 
union of these models to construct a model Mm q of size tq for 
the forest UsgQ Ts- 

For all Q £ C, we have tq < k by definition of tq and k. 
We can also prove that k < Epgctq. Indeed, if there exists a 
Qo £ C such that tQ — k, this is trivial. Otherwise, because T 
is weakly consistent, from C3 we can show that k < CSup(S) < 
E(CSup[C]) because 

E(CSup[C]) = E (E(CSup[Q])) = E [tq). 

Qec QgC 

We finally obtain 

max to < k < E to (H2) 

Qec v - - gee 

Thanks to (H2), we can build a model for the i-tree 7c, as follows. 
We start with the disjoint union of models Mq for Tq for Q £ C. 
This model has cardinality Ec-gciQ. Then, we rename elements 
from different models to be identical to elements from other mod- 
els. Such merging is possible as long as there is no model whose 
domain contains the domains of all others, so we can reach any 
cardinality k for maxQgc < k. 

REMARK 2. (Freedom in the choice of {ti}ig[i.. n ]) In Section^ 
we enforce some additional properties on models using a differ- 
ent choice of tq. Such construction is possible whenever i* satisfy 
(HI) and (H 2). Moreover, if K(S) denotes some chosen cardinal- 
ity for each node S, and the values K(S) satisfy certain assump- 
tions, then we can enforce additional properties when merging the 
models Mq corresponding for Q £ Split(S). The following two 
cases are of interest. 

1. \f ^2 tq > K(S), we can chose any pair of different split 

Qec 

views Q\,Qi £ C, and two elements x\ from Mq 1 and X2 
from Mq 2 and decide to merge them. 

2. If C = {Qo} ty Co and max tq < K(S), we can chose any 

Qec 

element in the model Mq a and decide not to merge it with any 
of the elements of the models Mq> for Q' £ Co- 

Step 2. Let Q £ Split(Sfi), such that Q is not included in any 
complete view. We construct a model Mq of size k for the i-forest 
Tr by first building a model of size K(S') = Clnf(S') for each 
5' £ Q. Because k > Clnf (S) > E(Clnf [Q]), by C 4 , the disjoint 
union of these models has cardinality smaller than k. By adding 
the correct number of fresh elements to A, we obtain a model of 
cardinality k. 

REMARK 3. (Existence of fresh elements) In SectionQwe use the 
following property: For all Q £ Split(S) and Q UComp(5'), 



if E(Clnf [Q]) < K(S), then there exists a model such that a(S) 
contains an element which does not belong to any a(S') for any of 
the sons of S. 

Step 3. We can apply an arbitrary bijection aq '■ Ac — ► [l..n] 
to each model Mc constructed as previously described before to 
build a model for the entire i-tree. We let o(Sr) = [l..n] and for 
all S ^ Sr, a(S) = aq(S) where C is the view containing an 
ancestor of S in Split(S'ji) or Comp(5'jj). 

REMARK 4. (Freedom in the choice of a.) If we know that 
K(S) > 0, for any pair Si , S2 of sons of S such that Si , S2 belong 
neither to the same split view nor to the same complete view, for 
each choice of elements xi, X2 in the models Ts 1 and T$ 2 we can 

def 

choose (7i,(72 such that cri(a;i) = (72(12) = x and the resulting 
i-tree will be such that x £ a (Si) n a(S2). 

Extending the model with 3. Let (A, a) \= T. Then for each 
P£PN, define 

S(P) = \J{a(S) S £ § A P £ $(S)} 

Then (+P) £ $(S) a(S) C E(P) holds by construction, 
it remains to show (— P) £ $(S) a(S) C E(P) C for every 
node S. Consider Si £ § such that (-P) i £ $(Si). If Si = d , 
then a(Si) = 0, so the condition trivially holds. Similarly, if 
(+P) £ $(Si),thenbyC 2 , CSup(Si) = OsoS(Si) = and the 
condition holds. Otherwise, assume (+P) ^ <E>(Si). For the sake 
of contradiction suppose that there exists an element x £ q(Si), 
x £ H(P). By definition of H(P), there exists a node S2 7^ Si 
such that x £ a(S2) and (+P) £ $(S2). By the condition on 
independent signatures, one of the followig two cases applies. 

1. disj5-(Si, S2). Then a(Si) PI a(S2) = by the semantics 
of i-diagrams, which is a contradiction with x £ a (Si) and 
x £ q(S 2 ). 

2. Si and S2 have compatiable signatures. Then there exists a 
node S such that Si A S, S 2 A S and Sig(Si) n Sig(S 2 ) C 
Sig(S). Because (-P) £ $(Si), (+P) £ Sig(Si), and 
because (~P) £ <J>(S 2 ), P £ Sig(S 2 ). Therefore P £ Sig(S). 
We have two cases: 

(a) (+P) £ $(S). By Ci, then (+P) £ $(Si), a contradic- 
tion. 

(b) (-P) £ $(S). By C 2 , then (-P) £ $(S 2 ). By C 2 then 
CSup(S2) = 0, so q(S2) = 0, a contradiction with x £ 
a(S 2 ). 

We have reached the contradiction in each case, so we conclude 
a(Si)CH(P) c . B 

A.4 Details of the Proofs for Subsumption Completeness 
A.4.1 Reflnemenents of Lemma|6| 

According to the remarks in the proof of Lemma[6| if an i-tree T is 
weakly consistent, there exists a choice of cardinalities K : S — > N, 
such that we can build a model (A, a, S) for T with the property 
|S(S)| = K(S) for all S £ §. For a fixed choice of cardinalities 
K, we can, in certain cases, enforce some additional properties 
by choosing which element we merge in the steps 1 and 3 of the 
construction. The three following Lemmas are based on this idea. 

LEMMA 18 (Non-empty intersection (1)). Let Si and S2 be nodes 
in a weakly consistent i-tree T is such that 

Clnf(Si) > A Si A Si A S[ £ Qi A 

Clnf(S 2 ) > A S 2 ^ S' 2 A S' 2 £ Q 2 

for some S, Si,S 2 £ S, Qi, Q2 £ Split(S) where Qi / Q2 and 
-^(3C £ Comp(S). Qi C C A Q2 C C). Then there exists a 
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model (A, a, H) for T such that 

a(Si)na(&)/0. 

Proof. We use the construction described in Lemma[6| with the 
exception of step 3 of the construction of the model Ms for the 
subtree Ts with root S, where we do the following: 

• We choose an element xi in qi (Si ) in the model M s< built for 
Tj/ (we know there exists one such x\ because Clnf(Si) > 0) 

• Analogously, we choose an element x 2 in 752 (S2) in the model 
M s > 2 build for T s / 

• We choose the bijections trq x and oq 2 in step 3 such that 
°~Qi( x l) = ^q 2 (x 2 ). ■ 

LEMMA 19 (Non-empty intersection (2)). Let Si and Si be nodes 
in a weakly consistent i-tree T is such that 

Clnf(Si) > A Si A S[ A 51 G Qi A 

Clnf(S 2 ) > A s 2 A S 2 A S' 2 G Q2 

for some S, S[ , S 2 £ S, Qi, Q2 € Split(S) w/iere Qi / Q2, aw/ 
Qi C C, Qi C C for some C G Comp(S) wiY/j f/ie property 

CSup(S) < E(CSup[C]). 

77ien //iere exists a model (A, a, H) /or X ™c/i f/za; 

a(Si)na(S 2 )^0. 

Proof. We use the construction described in the proof of 
Lemma |6| except for the step 1 of the construction of the model 
M s for the subtree Ts with root S, for which we do the following. 
From K(S) < CSup(S) < E(CSup[Q]), we conclude that the 
choice of cardinalities tq for Q G C in the proof of Lemma[6]is 
such that £q s c£q > K(S), by considering two cases. 

1 . There exists Q G C such that 

tq = min(if(S),E(CSup[Q])) = K(S). 

Then we choose Q; G {Qi,Q 2 } such that Qi ^ Q. Since 
Clnf(Si) > A Si A S[ ~* S, repeatedly applying C 4 and 
using C5, we have 

• E(CSup[Q;]) > CSup(S0 > Clnf(S0 > Clnf(S ! ) > 

• K{S) > Clnf(S) > Clnf(Si) > 

As a consequence tq f = min(_K'(S), E(CSup[Qi])) > and 
EQGctQ >t Q +t Qi >K(S). 

2. For all Q G C, tq = E(CSup[Q])). Then E QeC tQ = 
E(CSup[C*]) > CSup(S) > K{S). 

Because Eqgc^Q > K(S), we can apply Remark[2|and choose 
one element x\ in cfi(Si) in the model Ms[ built for Ts' , and 
an element xi in a2(S 2 ) in the model built for T s i , and decide 
to merge these elements in the step 1 of the construction. We 
know that such elements x\,xi exist because Clnf(Si) > and 
Clnf(S 2 ) > Q.m 

LEMMA 20 (Isolated element). Let T be a weakly consistent i- 
tree and (Qi, ***} a subtree of (§, «•-») with the same root Sr as 
T, such that for all S G Qi all the following conditions hold: 

1. Clnf(S) > 

2. VC* G Comp(S). 3 =1 Q G Split(S). Q C C* A |Q n Qi| = 1 

3. VQ G Split(S). [QnQi| ^ 1 => 

(|Q n Qi| = A E(Clnf[Q]) < Clnf(S)). 

r/ien we cfl« construct a model for T such that 

3x G a(R). VS G S. (x G q(S) «SgQi) 



Proof. We use a variation of the construction in the proof of 
Lemma|6| We apply the assumptions about the subtree Qi to show 
that we always have enough "slack" to avoid merging one specific 
element from Q\ with the elements of neighbors. We being by 
describing a slightly modified Step 1 of the proof of Lemma[6] 

Step V (for nodes of Q\). Consider the root Sr G Qi. Let 
C G Comp(Sfl), and C C Split(S fl ) such that C = UC. By 
construction, there exists a unique son S' G Qi of Sr and a 
corresponding split view Q' such that S' G Q' and C = {Q'}WCq. 
We define tqi = min(fc, E(CSup[Q'])) and for each Qo G Co, we 

def 

define tQ = min(fc — 1, E(CSup[Qo]))- For each Qo G Co, since 
k > Clnf(Sij) by choice of k and Clnf(S fl ) > E(Clnf[Q ]) by 
hypothesis 3 on T, we have E(Clnf [Qo]) < fc— 1, so 

£(Clnf[Q ]) < t Qo < E(CSup[Q ]) (HI) 

and the property (HI) also holds for tq', because tqr is defined as 
in Lemma[6] 

By definition of t we clearly have maxQgc tq < k. We next 
show E to > k by considering the following cases. 

qgc 

• tQi = k. Then the claim is obvious. 

• For all Q G C we have tq = E(CSup[Q]). The claim follows 
from C3 and the choice of k because Titq > E(CSup[C]) > 
CSup(S fl ) > fc. 

• There exists Qo G Co such that tq = k — 1. Using 
E(CSup[Q']) > CSup(S') > and k > Clnf(S fl ) > 0, 
we obtain tqi > 0, so 

E t Q > tq + t Q , > (fc - 1) + 1 > fc. 

Qec 

We finally obtain 

max to < k < S tq (H2) 

Qec ^ Qec 

By definition of all tq we then have 

VQ G C . t Qo < k (H3) 

According to Remark[2] H3 allows us to choose an element of the 
model Ms' constructed for the subtree T5/ and decide not to merge 
it with any other element. This observation allows us to recursively 
enforce x G a(S) S G Qi. 

Indeed, consider a node Sr G Qi and let {S R , . . . , S^} = 
Sons(Sfl) (~1 Qi be its sons in Qi. For each i, we can then recur- 
sively ensure Xi G cf(S) <^=> S G Qi for each S in the Sr 
subtree i.e. for each S for which S ~+ Sr. By definition of Qi, 
each S R is in a different complete view, so we can apply bijection 
to the submodels (Remark[4) and let ai(x\) — . . . = o p (x p ) — x. 
We ensure that x does not belong to any subtree rooted at a node 
So G Sons(Sij) \ Qi, using Remark[2]to make sure that x is not 
merged with any of the elements of a (So), which is possible thanks 
to H3. Finally, for the base case, when S has no sons, we pick x to 
be a fresh element, which is possible by assumption 3 on the sub- 
tree Qi, as noted in Remark[3] ■ 

A.4.2 Links between weak and strong consistency 

Lemma |S| (Bounds Refinement) Let T be a strongly consistent 
i-tree, S G S, i, s such that Clnf(S) < i < s < CSup(S), let 
T = T[Clnf(S) <- i, CSup(S) <- s] andT^ = Rfi F (T). Then 
1) T N ' F /_l_d, 2) 7^j' F |= T, and 3) if^(S So), then 

(Clnf(S ),CSup(S )) = (Cln^ F (S ),CSup' NF (S )). 

Proof. 

1. We prove this result by induction on the depth of S in the tree 
(§, ~-»). The key step of this proof is to show that the application 
of UpSup and/or Uplnf to the father S' of S do not produce 
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a situation where 05 holds in the resulting diagram T" (and 
therefore the rule Error is not applicable in T"). We distinguish 
three different cases : 

• When UpSup and Uplnf are both applicable to this node 
S' we have Clnf"(5") < CSup"(S") because for C G 
Comp(5"), Q G Split(S") such that Q C C, Q = {S} ttl 

S' ,C S' ,Q , 

— > T we have 



C = {S} fctJ Co and T' 



UpSup Uplnf 



Clnf"(S') = E(Clnf'[Q ]) + Clnf'(S) 
= E(Clnf'[Qo]) + CSup'(S) 
<E(CSup'[Q ]) + CSup'(S) 
<E(CSup'[C ]) + CSup'(S') 
= CSup"(S") 



(by b 4 ) 
(i<«) 
(byC 5 ) 
(Qo C Co) 
(by 63) 



When only UpSup is applicable to this node S' we have 
Clnf"(S") < CSup"(5") because for C G Comp(S"), 

C = {S} t±J C and T' T" we have 

UpSup 

CSup"(S') = E(CSup'[C ]) + CSup'(S) (by 63) 

>E(CSup'[Q ]) + Clnf'(S) (i < s) 

>Clnf'(5") (byCe) 
= Clnf"(S") 

When only Uplnf is applicable to this node S' we have 

Clnf"(S") < CSup"(S") because for Q G Split(S"), 

Q 



{S\ fctJ Q and T ^5 T" we have 

Uplnf 

Clnf"(5') = E(Clnf'[Q ]) + Clnf'(S) 
<E(Clnf'[Q ]) + CSup'(5) 
<CSup'(5') 
= CSup"(5") 



(by 64) 

(<<») 
(byC 7 ) 



2. Follows easily from the hypothesis Clnf(S') < i < s < 
CSup(S') and the fact that Rj^ F is semantics preserving. 

3. It is enough to notice that only rules Uplnf and UpSup are used 
when applying Rnf, and these rules are applied in the bottom- 
up direction. ■ 

Lemma |5| (Parallel Bounds Refinement (1)) Let T be a strongly 
consistent i-tree, and (Qo, o, subtree ofT which has the same 
root as T and is such that 

• The nodes of Qo are pairwise independent, that is 

VSi,Si G Q --(disj^(Si,5 2 )) 

Then the i-tree T defined by 

T' d = f T [ VS G Q :Clnf (S)<-CSup(5) ] 

is such that his 1Z W normal form 7^' F = Rn F (T') satisfies 

2- T N ' F |= T 

3. MS G Qo. VQ G Split' NF (5). Q n Q = => 
E(Clnf' NF [Q]) < Gnf NF (S) 

Proof. If we apply [Clnf (S , ')<— CSup(S')] to every node S' of Qo 
starting from the root to the leaves, we always maintain Ci , C2 , C3 
because $ and CSup are never modified. We also maintain C4 
because for each S G Qo and each view Q G Split(S) such that 
there exists S G Q PI Qo, by ->disj5-(Si, 52), we know that S' is 
the only modified node, and 

Clnf'(S) = CSup(S) 

>E(Clnf[Q -{£'}]) + CSup(S") (byC 7 ) 
= E(Clnf'[Q]) 



Therefore, T is already in normal form, so 7^' F is identical to T' 
and is clearly distinct from _l_d, proving condition 1. Condition 2 
holds because T' \= T because the cardinality bounds in T' are at 
least as strong as in T. Condition 3 holds because 



E(Clnf'[Q]) = E(Clnf[Q]) 
<CSup(S) 
= Clnf (5) 



(because Q PI Qo = I 
(byC 8 ) 

(by definition of T') 



LEMMA 2 1 (Parallel Bounds Refinement (2)). Let T be a strongly 
consistent i-tree and Si , 5*2 , S[ , S' 2 , S G § and C, Q such that 

C G Comp(S), Qi, Q 2 G Split(S) 

51 A S[ A S[ G Qi A Qi C C 

5 2 ^ S' 2 A S' 2 G Q 2 A Q 2 C C 
Qi + Q2 

Define 



T'^TfVS'.SiAs'As; : Clnf(S') 
[V5',S 2 A 5' A5 2 : Clnf(S') 

■<NF — r *'NFV-' J 



CSup(S")] 
CSup(5")] 



2"" — RKf(^'f[CSu P "(5) - Clnf NF (5)]) 
r/iCT T" =^_U T" |= T, fl/7rfCSup"(S) < E(CSup"[C]). 

Proof. As in the proof of Lemma [5] weak consistency con- 
ditions hold in T for all nodes in 5" such that Si S[ or 
S2 ^* S' 2 - Therefore, the only rewrite step that mat be applicable 
in T' is the application of Uplnf to S. This application may 
lead to applications of other instances of Uplnf, but the proof 
of Lemma|8| shows that this process will result in a weakly con- 
sistent i-tree, so T^ F 7^_l_d- Moreover, the process of comput- 
ing RK F (T N ' F [CSup''(5') <- Clnf nf OS)]) is identical to applying 
Lemma[8]to T' with bounds i = s = Clnf' NF (5'), and therefore 
leads to a weakly consistent i-tree T", so T" ^A-d- 

The condition T" \= T follows because R^ F is semantics- 
preserving, and the updates of trees only shrink the bounds on 
nodes, so they convert a diagram into a stronger one. 

To prove CSup"(S) < E(CSup"[C]), observe first 
that CSup"(5) = ClnfNF(S') by definition of T", and 
E(CSup"[C]) = E(CSup[C]) because CSup does not change for 
any ancestors of S. Therefore, it suffices to show 

Clnf' NF (S) < E(CSup[C]) 

We prove this condition by distinguishing two cases. 

1. Uplnf is not applicable to S. Then Clnf' NF (S') = Clnf (S) and 
the condition follows by C9. 

2. Uplnf is applicable to S. Then for some a, b where {a, b} = 
{1, 2} we have 

Clnf' NF (5) = E(Clnf'[Q.]) 

< E(Clnf'[Q a ])+E(Clnf'[Q 6 ]) 

< E(Clnf'[C]) 

< E(CSup[C]) 



A.4.3 Completeness of the algorithm Subsumes 

Theorem|3| Let T be a strongly consistent i-tree and let for 
atomic formula A be as defined in Figure U2\ Then H J if and only 

•rr .1. 

The (=>) direction of Theorem |4] is trivial by the semantics of i- 
diagrams. For (-$=) direction we prove the following characteriza- 
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tions: 



S/0d 

k G [Clnf(S),CSup(S)] 
-.(Included(5 , C, Tj) 

Si / d A -h(Si A S 2 ) 
Si ^ S2 

h 0{5i,5 2 }A-disj^(5i,52) 
+P $(S) 
-P0$(S) 



3M a(S) ^ 
3X. |3(S)| = k 
3M. a(S ) 2IJa[C] 
3M. a(Si) g a(S 2 ) 
3M. a(Si) ^a{S 2 ) 
3M. a(Si)r\a(S 2 ) 
3M. a(S) % H(P) 
3M. a(S) % E(P) C 



where M denotes a model M — (A, a, E) of T. We next present 
the remaining lemmas that prove these characterizations (see also 
Section^. 

Lemma 1121 If an i-tree T is strongly consistent, 5*0,6 S, C G 
P(S), and Included(So, C, T) returns false, f/zen 

3.M. a(S ) <Z\Ja[C] 

Proof. Let Qo be the smallest set of nodes such that: 

50 S ^SeQo 
S G Qo A Qi G Comp(S)A \ „ 

51 G Qi A -,Incl(5i , C) 1 G W ° 

By definition of Incl we know that Qo PI C = 0. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We compute a subtree Qi of Qo, 
by starting from the root and keeping each time at most one son 
for each complete view. We also impose that Qi contains S, by 
avoiding to cut the branch which leads to S. 

We then define 

T' = Rnf(T[VS G Qi : Clnf(S)f-CSup(5)]) 
According to Lemma [5] (Parallel Bounds Refinement) we have 

We then apply Lemma l20l to construct a model for the weakly 
consistent i-tree T' such that 

VS" G S. (x €5(5") O S' G Qi) 

for some element x G a(S). Because S G Qi, we have x G a(S). 
Because C PI Qi = 0, we have x ^ [~| 5[C]. ■ 

Lemma ll5l /fan i-tree 7" ;.s strongly consistent, then for all S G § 
and P G P N we /rave 

(+P) $(S) 3M. .M |= T A qai(S) 2 S(P) 
(-P) $(S) 3.M. .M ^ T A om(S) g 5(P) C 

Proo/ Let S G § be such that (+P) £ $(S). We define 

Q +P d ={S' G S|(+P) G *(S')}. Using C10 we show that 
Included(S, Q+p, T) cannot return true. Then, there exists a 
model such that a(S) % (\J~a[Q+p\). We then change the model 
by redefining H' by: 

VP G PN. 3'(P) = \J{a(S')\S' G Q+p} 

to ensure that H'(P) g a(S). The case of (-P) $(S) is 
analogous by taking a model such that a(S) % (U an d 
redefining B' by: 

VP G PN. B'(P) = A — \J{a(S')\S' G Q-p} 



Lemma 1161 If an i-tree T is strongly consistent, then for all 
Si, S2 G § such that Si 7^ 0, S2 7^ we Aave 

-n(disj^(Si,S 2 )) => 3M. a(Si)na(S 2 ) + 0. 



Proo/ Let Si,S 2 € § \ {0 d } such that -.(disj^Si, S 2 )). If 
Si = 5*2 w e can find a model A4 where a (Si) = a(S 2 ) 7^ by 
Lemma fTol in this model q(Si) PI a(S 2 ) = a(S 2 ) 7^ 0. Suppose 
Si 7^ S 2 . Define S[,S' 2 , So as the unique nodes such that So is the 
least common ancestor of Si and S 2 in T, and S[,S 2 G Sons(So) 
are the ancestors of Si and S 2 , respectively. We distinguish two 
cases: 

• S[ and S' 2 do not belong to a same complete view of So . Then 
apply Lemma[9]to the subtree 

Qo d = {S G § I Si ^ S V S 2 A S} 

whose nodes are pairwise independent by the hypothesis 
disj^(Si, S 2 ) . The resulting tree 7^' F satisfies the hypothesis 
of Lemma ll8l so there exists a model M = (A, a, B) for 7^' F 
such that 

a(Si)na(S 2 ) 7^0. 

Ai is also a model of T because 7^ F |= T. 

• Si and S' 2 belong to a same complete view C. Define 

T d = f B% F (T[JS',Si AS'is; : Clnf(S') <- CSup(S')] 
[VS'.Sa^S'A^a : Clnf(S')*-CSup(S')]) 
T" = / RU F (T / [CSup"(5) <- Clnf'(S)] 

ByLemma[2l]thenT" |= TandCSup"(S) < E(CSup"[C]). 
This last property allows us to apply Lemma 1191 and prove 
the existence of a model (A, a, B) for T such that a(Si) Pi 
a(S 2 ) + 0.B 



B. Example transformation to CBAC constraints 

We illustrate the idea of the algorithm in Figure |2| through an 
example of checking the validity of the formula 

\AUB\ = \A\ + \B\ —\AnB\ 

that is, checking the unsatisfiability of the formula 

\AUB\^ \A\ + \B\ -\AHB\ 

This formula has no integer variables initially, so the first step does 
not apply. Furthermore, all set algebra expressions appear already 
within cardinality constraints, so step 2 is unnecessary as well, as 
is step 3 because there are no divisibility constraints. 

In step 4, we introduce a non-negative integer variable for each 
cardinality term, yielding the system: 

UUB / IA + IB — lAnB 

|1| = MAXC 

\AUB\= i A UB 
\A\ = l A 

\B\ =i B 
\ADB\ = i A nB 

Because the formula is already in the form of a conjunction, there 
is no need to non-deterministically guess the conjunction in step 5, 
and we continue with the current constraints. 

In step 6, we non-deterministically replace «aus 7^ iA + is — 
Mns with iAUB + 1 < iA + is - IAnB or za + is - Mns + 1 < 
Mus, and we illustrate the first case (the other case needs to be 
checked as well unless the satisfying assignment is found in the 



On Algorithms and Complexity for Sets with Cardinality Constraints 



19 



2008/2/1 



first case): 

!AUB + 1 < «A + IB — tAnB 
1| = MAXC 
\A\JB\ = iAUB 

\A\=i A 

\B\=i B 

\ADB\ = i A nB 
In step 7, we introduce a slack variable io to eliminate the first 
inequation, and transform the equation to normal form: 

i A uB — i A — ib + UnB + io = —1 

|1| = MAXC 

\AUB\ =i AU B 

\A\ = i A 

\B\ = i B 

\Af]B\ = i A nB 

In step 8, we identify v — (MAXC, jaub, «a, «s, Una, *o) and 
have A = [0, 1, -1, -1, 1, 1] and d = (-1). We have m = 1, 
mi = 5, no = 6, and S = 2. Therefore, m — 6,n — 6, and a = 1. 
From these values we compute M = 6 • (6 • l) 11 = 6 12 < 2 32 . 
Therefore, if there exists a solution to the system of equations, there 
exists a solution that can be represented by at most 32 bits in binary. 

In step 9, we guess non-deterministically a solution vector 
bounded by M that satisfies the first equation (that is, Ak — — 1). 
One such solution is k = (101, 100, 77, 40, 10, 6) (written in dec- 
imal notation). For this guessed solution, we generate the CBAC 
constraint: 

|1| = 101 

\AUB\ = 100 
\A\ = 77 
|B| = 40 
\Ar\B\ = 10 

This example CBAC constraint does not have a solution, and nei- 
ther do the remaining examples generated by the non-deterministic 
algorithm. The search tree corresponding to the non-deterministic 
algorithm returns false in all branches, so the formula is unsatisfi- 
able, and its negation is valid. 

We note that, in this case, the dimensions of the system 
of equations are such that the estimate M can be improved 
if we consider the system of equations where the variables 
MAXC, «aus, i A , iB,iAnB are all substituted into the the original 
integer part of the QFBAPA problem. In general, this alternative 
estimate is given by M' = n' (m'a) 2m +1 where 

m' — mo 

n' = max(no — mi, 2 ) 
a' = max max \d q \, 

\l<q<m 

max max \ a pq \, 

p=l q£R 

maxmax( J2 a vq> Y, (- a pq)) ) 

p=i i 

q eQ q eQ 7 

a pq >0 a pq <0 

where Q = {pi , . . . , p mi } are indices of variables denoting cardi- 
nalities (and that are being substituted), and R = {1, . . . , no} \ R. 
In our example, 

m' = 1 

n' =4 

a' = max(l, 1, 2) 



and M' = 16, so it suffices to use only 4 bits to represent the 
constants in the resulting CBAC constraints generated in step 10. 
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